Automated Evidence Extraction and Scoring for Corporate Climate Policy Engagement: A Multilingual RAG Approach

πŸ“… 2025-09-10
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This study addresses the challenges of time-consuming, error-prone, and linguistically limited manual assessment of corporate climate policy engagement evidence. To this end, we propose a multilingual Retrieval-Augmented Generation (RAG) framework. Methodologically, it integrates layout-aware document parsing, the Nomic multilingual embedding model, and few-shot prompting to enable precise retrieval, evidence extraction, and classification scoring of climate-related textual statements across multinational enterprises. Our key contribution is the first incorporation of document structural information alongside multilingual semantic embeddings into the RAG pipeline, significantly enhancing cross-lingual understanding robustness. Experiments demonstrate that the system achieves a 3.2Γ— improvement in evidence extraction efficiency over baselines on multilingual corporate documents and attains an F1-score of 0.89β€”the current state-of-the-art. However, expert validation remains necessary for nuanced policy stance judgments in complex contextual scenarios.

Technology Category

Application Category

πŸ“ Abstract
InfluenceMap's LobbyMap Platform monitors the climate policy engagement of over 500 companies and 250 industry associations, assessing each entity's support or opposition to science-based policy pathways for achieving the Paris Agreement's goal of limiting global warming to 1.5Β°C. Although InfluenceMap has made progress with automating key elements of the analytical workflow, a significant portion of the assessment remains manual, making it time- and labor-intensive and susceptible to human error. We propose an AI-assisted framework to accelerate the monitoring of corporate climate policy engagement by leveraging Retrieval-Augmented Generation to automate the most time-intensive extraction of relevant evidence from large-scale textual data. Our evaluation shows that a combination of layout-aware parsing, the Nomic embedding model, and few-shot prompting strategies yields the best performance in extracting and classifying evidence from multilingual corporate documents. We conclude that while the automated RAG system effectively accelerates evidence extraction, the nuanced nature of the analysis necessitates a human-in-the-loop approach where the technology augments, rather than replaces, expert judgment to ensure accuracy.
Problem

Research questions and friction points this paper is trying to address.

Automating evidence extraction from corporate climate policy documents
Reducing manual labor in assessing Paris Agreement policy alignment
Handling multilingual data for climate engagement analysis accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multilingual RAG approach for evidence extraction
Layout-aware parsing with Nomic embedding model
Few-shot prompting for document classification
πŸ”Ž Similar Papers
No similar papers found.
I
Imene Kolli
University of Zurich, Department of Finance
Ario Saeid Vaghefi
Ario Saeid Vaghefi
University of Zurich, Department of Geography, Switzerland
NLPClimate changeLLMsExtreme eventsClimate Risks
C
Chiara Colesanti Senni
University of Zurich, Department of Finance
S
Shantam Raj
University of Zurich, Department of Informatics
Markus Leippold
Markus Leippold
University of Zurich, Department of Finance
FinanceClimate ChangeNatural Language ProcessingFinancial EconomicsMathematical Finance