Pharos-ESG: A Framework for Multimodal Parsing, Contextual Narration, and Hierarchical Labeling of ESG Report

📅 2025-11-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
ESG reports impede automated understanding due to their unstructured layouts (e.g., slide-based formatting) and implicit semantic hierarchies. To address this, we propose the first unified multimodal parsing framework specifically designed for ESG reports: it models layout-aware reading order via typographic flow, integrates table-of-contents-guided hierarchical segmentation with multimodal semantic aggregation, and incorporates a novel triple-labeling scheme—ESG, GRI, and sentiment—to enhance semantic grounding. We further introduce Aurora-ESG, the first large-scale, cross-market ESG report dataset comprising over 12,000 documents. Extensive experiments demonstrate that our method significantly outperforms both domain-specific document parsers and general-purpose multimodal foundation models across multiple benchmarks, producing high-fidelity structured outputs. These results robustly support downstream ESG quantification and financial governance decision-making.

Technology Category

Application Category

📝 Abstract
Environmental, Social, and Governance (ESG) principles are reshaping the foundations of global financial gover- nance, transforming capital allocation architectures, regu- latory frameworks, and systemic risk coordination mecha- nisms. However, as the core medium for assessing corpo- rate ESG performance, the ESG reports present significant challenges for large-scale understanding, due to chaotic read- ing order from slide-like irregular layouts and implicit hier- archies arising from lengthy, weakly structured content. To address these challenges, we propose Pharos-ESG, a uni- fied framework that transforms ESG reports into structured representations through multimodal parsing, contextual nar- ration, and hierarchical labeling. It integrates a reading-order modeling module based on layout flow, hierarchy-aware seg- mentation guided by table-of-contents anchors, and a multi- modal aggregation pipeline that contextually transforms vi- sual elements into coherent natural language. The framework further enriches its outputs with ESG, GRI, and sentiment labels, yielding annotations aligned with the analytical de- mands of financial research. Extensive experiments on anno- tated benchmarks demonstrate that Pharos-ESG consistently outperforms both dedicated document parsing systems and general-purpose multimodal models. In addition, we release Aurora-ESG, the first large-scale public dataset of ESG re- ports, spanning Mainland China, Hong Kong, and U.S. mar- kets, featuring unified structured representations of multi- modal content, enriched with fine-grained layout and seman- tic annotations to better support ESG integration in financial governance and decision-making.
Problem

Research questions and friction points this paper is trying to address.

Parsing ESG reports with chaotic layouts and implicit hierarchies
Transforming multimodal ESG content into structured representations
Enhancing financial analysis through automated ESG report annotation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal parsing transforms ESG reports into structured representations
Hierarchy-aware segmentation guided by table-of-contents anchors
Multimodal aggregation pipeline converts visual elements to language
🔎 Similar Papers
No similar papers found.
Y
Yan Chen
School of Finance, Institute of Chinese Financial Studies, Fintech Innovation Center, Southwestern University of Finance and Economics, Chengdu, China
Y
Yu Zou
School of Finance, Institute of Chinese Financial Studies, Fintech Innovation Center, Southwestern University of Finance and Economics, Chengdu, China
J
Jialei Zeng
School of Finance, Institute of Chinese Financial Studies, Fintech Innovation Center, Southwestern University of Finance and Economics, Chengdu, China
Haoran You
Haoran You
Georgia Tech
Efficient MLAlg-HW Co-Design
X
Xiaorui Zhou
School of Finance, Institute of Chinese Financial Studies, Fintech Innovation Center, Southwestern University of Finance and Economics, Chengdu, China
A
Aixi Zhong
School of Finance, Institute of Chinese Financial Studies, Fintech Innovation Center, Southwestern University of Finance and Economics, Chengdu, China