Supporting ESG Work with Data Science

· January 23, 2023

ESG planning, tracking, and reporting is both labor-intensive and time-consuming, but with new regulations beginning to standardize the disclosure of environmental, social, and governance factors, the work has to be done. As a result, there has been a growth in sustainability software powered by data science methods (e.g. machine learning, AI) to facilitate tracking and reporting. Data science can help in many ways, but it does have its limits.

The term “data science” often gets thrown around as if it is a one-stop solution for outstanding challenges and problems. Want to know how your product will impact the environmental footprint of your target customers? Data science. Want to screen new and promising sustainable technologies? Data science. We like to think that the more data we throw at a problem, the more answers we’ll get. However, integrating with a million existing systems to “unlock your data” will only get you so far if you’re not intentional about the data you’re gathering. It could even result in more questions than answers, reducing the possibility of building a collective understanding. In order to meet your objectives, you need to identify the appropriate data that will answer your questions, which requires genuine domain expertise.

What’s more, the typical “black box” model of feeding your data into a system that performs some incomprehensible transformations to provide you with an unverifiable output is not going to be sufficient in this new era of disclosures. Stakeholders need to understand—actually understand—key indicators and assumptions that drive the results, and there needs to be a clear line of sight from the original data so that the results can be validated and trusted. In a thorough study we recently conducted with the help of CRANE’s users, accountability and transparency were the top priorities.  All that being said, despite its limits, there are some ways that data science can help throughout the process.

Data Collection

Establishing your baseline metrics as outlined in standards like SASB, TCFD, or GRI, is one of the most arduous undertakings for any ESG program. When reporting on these metrics, you need to back up your data with auditable documentation, aka. evidence. The number of documents you may end up sifting through can become immense, making the task both time-consuming and overwhelming. Natural language processing can help facilitate this work by automating the identification and organization of relevant documents needed to paint a clear picture of your current ESG performance, establish data baselines, and identify blind spots.


Once you’ve collected your data, whether it be documents to support your baseline metrics or survey responses to materiality assessments, your structured and unstructured data can be transformed into data-driven insights. Forward-looking predictions and scenario analyses can help your organization evaluate its potential and directly inform the strategic vision and long-term goals of your ESG program. Optimizations on your supply chain can identify major risks and actionable steps to maximize your results. As you track, analytics can help you determine if you’re going to meet your ESG goals or help make necessary corrections. But graphs, tables, and automated insights are only useful if they are reducible to relevant actions and better decisions on the ground. Remember, ESG is really a story about people and things, and we shouldn’t allow ourselves to get lost in abstract concepts or convoluted metrics.


You’ll need to produce a full report to effectively communicate your ESG practices, but we also often advise those in the early stages of their ESG journey to start by creating an interim ESG report that outlines what you have done to kickstart your journey and list next steps as well as long-term goals that you’re actively pursuing. The focus of this report should be transparency and definitive aspirations, which can be supported by data to provide evidence of your efforts so far. The data should be verifiable, meaning that there is a clear connection to primary sources (e.g. auditable documentation). Having a good data management system, such as that provided by Gemini, will help your organization lay the breadcrumbs from source documentation to reported values so that you can disclose your ESG practices with confidence. 

So is data science the panacea to your ESG questions and challenges? Probably not. But data science tools and methods can help facilitate the process of implementing and managing your organization’s ESG strategy. It can automate parts of the data collection process. Combined with ESG domain expertise, analytical insights can be transformed into decisions and actions. And a good data management system makes it easy to connect reported disclosures with primary sources. Ultimately, it is data science paired with ESG expertise, not data science alone, that will help you answer your ESG questions.