🤖 AI Summary
This study addresses the challenge of automating greenwashing detection—i.e., identifying misleading corporate sustainability claims. We propose the first fact-centered detection framework, built upon EmeraldGraph, a domain-specific knowledge graph for ESG (Environmental, Social, and Governance) data. Our method integrates structured extraction from multi-source ESG reports, retrieval-augmented generation (RAG), and zero-shot large language model (LLM) reasoning—requiring no fine-tuning for claim verification. Key contributions include: (1) a verification paradigm anchored in verifiable facts; (2) an evidence-driven, interpretable decision mechanism; and (3) a transparent decision process supporting justified abstention. Evaluated on a novel greenwashing benchmark dataset, our approach significantly outperforms general-purpose LMs in accuracy, coverage, and explanation quality—bridging critical gaps in domain knowledge integration and explainability for sustainable claim validation.
📝 Abstract
As AI and web agents become pervasive in decision-making, it is critical to design intelligent systems that not only support sustainability efforts but also guard against misinformation. Greenwashing, i.e., misleading corporate sustainability claims, poses a major challenge to environmental progress. To address this challenge, we introduce EmeraldMind, a fact-centric framework integrating a domain-specific knowledge graph with retrieval-augmented generation to automate greenwashing detection. EmeraldMind builds the EmeraldGraph from diverse corporate ESG (environmental, social, and governance) reports, surfacing verifiable evidence, often missing in generic knowledge bases, and supporting large language models in claim assessment. The framework delivers justification-centric classifications, presenting transparent, evidence-backed verdicts and abstaining responsibly when claims cannot be verified. Experiments on a new greenwashing claims dataset demonstrate that EmeraldMind achieves competitive accuracy, greater coverage, and superior explanation quality compared to generic LLMs, without the need for fine-tuning or retraining.