Judging It, Washing It: Scoring and Greenwashing Corporate Climate Disclosures using Large Language Models

📅 2025-02-20

📈 Citations: 0

✨ Influential: 0

career value

170K/year

🤖 AI Summary

Large language models (LLMs) pose dual risks in corporate climate disclosure assessment—both as evaluators vulnerable to greenwashing and as generators capable of producing deceptive climate claims. Method: We propose the LLM-as-a-Judge framework, implementing two evaluation paradigms: numerical scoring and pairwise comparison. Through prompt engineering and adversarial response generation, we systematically evaluate LLMs’ robustness against LLM-generated misleading climate statements. Contribution/Results: This work is the first to demonstrate that a single LLM can simultaneously perform objective climate disclosure assessment and actively generate greenwashing content, exposing critical vulnerabilities in conventional scalar-scoring approaches. Empirical results show that the pairwise comparison paradigm achieves significantly higher discrimination accuracy and stability under adversarial perturbations compared to numerical scoring. Our study establishes a novel methodology and empirical benchmark for trustworthy climate disclosure evaluation, advancing both methodological rigor and practical accountability in ESG assessment.

Technology Category

Application Category

📝 Abstract

We study the use of large language models (LLMs) to both evaluate and greenwash corporate climate disclosures. First, we investigate the use of the LLM-as-a-Judge (LLMJ) methodology for scoring company-submitted reports on emissions reduction targets and progress. Second, we probe the behavior of an LLM when it is prompted to greenwash a response subject to accuracy and length constraints. Finally, we test the robustness of the LLMJ methodology against responses that may be greenwashed using an LLM. We find that two LLMJ scoring systems, numerical rating and pairwise comparison, are effective in distinguishing high-performing companies from others, with the pairwise comparison system showing greater robustness against LLM-greenwashed responses.

Problem

Research questions and friction points this paper is trying to address.

Evaluate corporate climate disclosures

Greenwash climate disclosures

Test robustness of scoring methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs evaluate climate disclosures

LLMs assess greenwashing techniques

Robust scoring with pairwise comparison

🔎 Similar Papers

No similar papers found.