ESGenius: Benchmarking LLMs on Environmental, Social, and Governance (ESG) and Sustainability Knowledge

📅 2025-06-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the lack of specialized evaluation benchmarks for assessing large language models’ (LLMs) knowledge comprehension and reasoning capabilities in environmental, social, and governance (ESG) and sustainability domains. We introduce ESGenius, the first dedicated benchmark comprising 1,136 expert-validated multiple-choice questions (ESGenius-QA) and a corpus of 231 authoritative ESG documents (ESGenius-Corpus), supporting both zero-shot and retrieval-augmented generation (RAG) evaluation. Our method features an explicit question–source alignment mechanism to enable interpretable assessment and RAG optimization, employs an LLM-generated + expert-verified question construction paradigm, and establishes a cross-scale model evaluation framework. Experiments across 50 LLMs—spanning three orders of magnitude in parameter count—reveal zero-shot accuracy ranging from 55% to 70%, while RAG substantially improves performance (e.g., DeepSeek-R1-Distill-Qwen-14B achieves 80.46%), demonstrating the critical role of domain-specific authoritative knowledge injection.

Technology Category

Application Category

📝 Abstract
We introduce ESGenius, a comprehensive benchmark for evaluating and enhancing the proficiency of Large Language Models (LLMs) in Environmental, Social and Governance (ESG) and sustainability-focused question answering. ESGenius comprises two key components: (i) ESGenius-QA, a collection of 1 136 multiple-choice questions generated by LLMs and rigorously validated by domain experts, covering a broad range of ESG pillars and sustainability topics. Each question is systematically linked to its corresponding source text, enabling transparent evaluation and supporting retrieval-augmented generation (RAG) methods; and (ii) ESGenius-Corpus, a meticulously curated repository of 231 foundational frameworks, standards, reports and recommendation documents from seven authoritative sources. Moreover, to fully assess the capabilities and adaptation potential of the model, we implement a rigorous two-stage evaluation protocol -- Zero-Shot and RAG. Extensive experiments across 50 LLMs (ranging from 0.5 B to 671 B parameters) demonstrate that state-of-the-art models achieve only moderate performance in zero-shot settings, with accuracies typically around 55--70%, highlighting ESGenius's challenging nature for LLMs in interdisciplinary contexts. However, models employing RAG show significant performance improvements, particularly for smaller models. For example,"DeepSeek-R1-Distill-Qwen-14B"improves from 63.82% (zero-shot) to 80.46% with RAG. These results underscore the necessity of grounding responses in authoritative sources for enhanced ESG understanding. To the best of our knowledge, ESGenius is the first benchmark curated for LLMs and the relevant enhancement technologies that focuses on ESG and sustainability topics.
Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs' proficiency in ESG and sustainability knowledge
Assessing LLMs' performance in zero-shot and RAG settings
Providing a benchmark for ESG-focused question answering
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-generated and expert-validated ESG questions
Retrieval-augmented generation (RAG) for performance boost
Comprehensive ESG corpus from authoritative sources
🔎 Similar Papers
No similar papers found.
Chaoyue He
Chaoyue He
Research Scientist@Alibaba-NTU Global e-Sustainability CorpLab (ANGEL)
LLMsSustainabilityRecommender SystemDeep LearningPost Training
X
Xin Zhou
Alibaba-NTU Global e-Sustainability CorpLab (ANGEL), Singapore
Y
Yi Wu
Alibaba-NTU Global e-Sustainability CorpLab (ANGEL), Singapore
Xinjia Yu
Xinjia Yu
Inter-disciplinary Gradudate School (IGS), Nanyang Technological University (NTU), Singapore
Affective ComputingMulti-agent SystemsSerious GamesSocial Studies
Y
Yan Zhang
Alibaba-NTU Global e-Sustainability CorpLab (ANGEL), Singapore
L
Lei Zhang
Alibaba-NTU Global e-Sustainability CorpLab (ANGEL), Singapore
D
Di Wang
Alibaba-NTU Global e-Sustainability CorpLab (ANGEL), Singapore
S
Shengfei Lyu
Alibaba-NTU Global e-Sustainability CorpLab (ANGEL), Singapore
H
Hong Xu
Alibaba-NTU Global e-Sustainability CorpLab (ANGEL), Singapore
X
Xiaoqiao Wang
Alibaba Group, China
W
Wei Liu
Alibaba Group, China
C
C. Miao
Alibaba-NTU Global e-Sustainability CorpLab (ANGEL), Singapore