Clear Sky Science · en

Interpretable predictive model for listed companies ESG greenwashing based on XGBoost and SHAP

· Back to index

Why company "green" claims matter

When a company boasts about its environmental or social achievements, investors and the public have to ask: is this real progress or just clever marketing? This gap between image and reality is called greenwashing. As more money flows into investments that promise to be environmentally and socially responsible, being able to spot greenwashing early becomes crucial for protecting investors, guiding policy, and rewarding genuinely responsible firms.

Looking under the hood of green promises

The study focuses on companies listed in China and looks at their Environmental, Social, and Governance (ESG) behavior from 2009 to 2022. Instead of taking firms’ sustainability reports at face value, the authors compare two independent types of ESG ratings. One, from Bloomberg, captures how much ESG information a company discloses in public reports. The other, from Chinese and international rating agencies, captures how well the company actually performs on ESG issues in practice. By standardizing and subtracting these scores, the researchers build a greenwashing index: firms that talk much more than they act are flagged as likely greenwashers.

Teaching a model to recognize greenwashing patterns

To move beyond simple averages and rules of thumb, the authors use a modern machine-learning method called XGBoost, which is especially good at finding patterns in complex, messy data. They feed the model 16 different characteristics for each company. These include financial health (such as profits, debt levels, growth and size), governance structure (such as how concentrated ownership is and how many directors sit on the board), and external pressures (such as how strict local regulation is, how intense media scrutiny appears to be, and how competitive the firm’s market is). The goal is to see whether the model can accurately predict which companies will be greenwashing one or two years later.

Figure 1
Figure 1.

How well can we predict who is greenwashing?

The model that performs best relies on company characteristics from the previous year. It correctly identifies about 87% of greenwashing firms and distinguishes them from non-greenwashers much better than random guessing. Financial indicators turn out to be especially informative. Firms with extremely high or low debt burdens, very easy short-term liquidity, or unusually high market valuations are more likely to be flagged as greenwashers. Larger and faster-growing companies, by contrast, are less likely to be predicted as greenwashers—possibly because their visibility and long-term outlook make deceptive signaling riskier. Governance indicators also matter: when the largest shareholder owns a bigger slice of the company, greenwashing becomes less likely, but higher shareholdings by supervising insiders are associated with more greenwashing.

Opening the black box of artificial intelligence

A common worry about machine learning is that it can be accurate yet opaque. To address this, the study uses a technique called SHAP, which allows the researchers to decompose each prediction into contributions from individual features. In practical terms, this means they can say not only that a given company is likely greenwashing, but also that, for example, its unusually high current ratio and low ownership concentration were key reasons. The analysis confirms that financial indicators, especially the current ratio, agency costs, and how ownership is distributed, carry more weight than governance headcounts or board size. The model still works well when ESG performance is measured by alternative rating providers and even when applied to companies listed in Hong Kong, suggesting that the approach is robust across different rating systems and disclosure environments.

Figure 2
Figure 2.

What this means for investors and watchdogs

In everyday terms, the study shows that it is possible to build a reliable early-warning system for ESG greenwashing using information that is already widely available in financial databases. By combining multiple ESG ratings with rich company data and an interpretable machine-learning model, regulators could target their oversight more efficiently, investors could better avoid glossy but hollow ESG stories, and companies would face stronger pressure to back up their claims with real action. While the work focuses on China, the basic recipe—compare what firms say with what they do, and let transparent algorithms learn the patterns—offers a blueprint for cleaning up ESG claims in markets around the world.

Citation: Jianfeng, Z., Tiantian, Q. Interpretable predictive model for listed companies ESG greenwashing based on XGBoost and SHAP. Sci Rep 16, 12899 (2026). https://doi.org/10.1038/s41598-026-42004-1

Keywords: ESG greenwashing, sustainable investing, machine learning, corporate governance, financial disclosure