Judging It, Washing It: Scoring and Greenwashing Corporate Climate Disclosures using Large Language Models

📅 2025-02-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) pose dual risks in corporate climate disclosure assessment—both as evaluators vulnerable to greenwashing and as generators capable of producing deceptive climate claims. Method: We propose the LLM-as-a-Judge framework, implementing two evaluation paradigms: numerical scoring and pairwise comparison. Through prompt engineering and adversarial response generation, we systematically evaluate LLMs’ robustness against LLM-generated misleading climate statements. Contribution/Results: This work is the first to demonstrate that a single LLM can simultaneously perform objective climate disclosure assessment and actively generate greenwashing content, exposing critical vulnerabilities in conventional scalar-scoring approaches. Empirical results show that the pairwise comparison paradigm achieves significantly higher discrimination accuracy and stability under adversarial perturbations compared to numerical scoring. Our study establishes a novel methodology and empirical benchmark for trustworthy climate disclosure evaluation, advancing both methodological rigor and practical accountability in ESG assessment.

Technology Category

Application Category

📝 Abstract
We study the use of large language models (LLMs) to both evaluate and greenwash corporate climate disclosures. First, we investigate the use of the LLM-as-a-Judge (LLMJ) methodology for scoring company-submitted reports on emissions reduction targets and progress. Second, we probe the behavior of an LLM when it is prompted to greenwash a response subject to accuracy and length constraints. Finally, we test the robustness of the LLMJ methodology against responses that may be greenwashed using an LLM. We find that two LLMJ scoring systems, numerical rating and pairwise comparison, are effective in distinguishing high-performing companies from others, with the pairwise comparison system showing greater robustness against LLM-greenwashed responses.
Problem

Research questions and friction points this paper is trying to address.

Evaluate corporate climate disclosures
Greenwash climate disclosures
Test robustness of scoring methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs evaluate climate disclosures
LLMs assess greenwashing techniques
Robust scoring with pairwise comparison
🔎 Similar Papers
No similar papers found.
M
Marianne Chuang
UC Santa Cruz
G
Gabriel Chuang
Columbia University
C
Cheryl Chuang
UC Santa Cruz
John Chuang
John Chuang
Professor, UC Berkeley School of Information
ICT for SustainabilityBiosensory ComputingPassthoughtsPeer ProductionSecurity Economics