🤖 AI Summary
To address climate knowledge obsolescence, low accuracy of retrieval-augmented generation (RAG) in parsing complex sustainability reports, and high costs of manual carbon analysis, this paper proposes a domain-specific large language model system for corporate carbon emission analysis and climate policy question answering. Methodologically, it introduces a novel 14-dimensional multi-faceted carbon accounting framework; an enhanced self-prompting RAG architecture integrating intent recognition, structured reasoning chains, hybrid retrieval, and Text2SQL; and a timestamped, hallucination-aware multi-level chunking mechanism. The system incorporates GHG Protocol standards, rule-based and long-context collaborative parsing, and verifiability enhancements. Experiments demonstrate significant improvements in sustainability report summarization, relevance assessment, and customized Q&A accuracy—reducing hallucination rates by 37.2%. It further enables real-time knowledge updates and traceable verification, achieving high-accuracy, low-cost, and interpretable carbon intelligence.
📝 Abstract
As the impact of global climate change intensifies, corporate carbon emissions have become a focal point of global attention. In response to issues such as the lag in climate change knowledge updates within large language models, the lack of specialization and accuracy in traditional augmented generation architectures for complex problems, and the high cost and time consumption of sustainability report analysis, this paper proposes CarbonChat: Large Language Model-based corporate carbon emission analysis and climate knowledge Q&A system, aimed at achieving precise carbon emission analysis and policy understanding.First, a diversified index module construction method is proposed to handle the segmentation of rule-based and long-text documents, as well as the extraction of structured data, thereby optimizing the parsing of key information.Second, an enhanced self-prompt retrieval-augmented generation architecture is designed, integrating intent recognition, structured reasoning chains, hybrid retrieval, and Text2SQL, improving the efficiency of semantic understanding and query conversion.Next, based on the greenhouse gas accounting framework, 14 dimensions are established for carbon emission analysis, enabling report summarization, relevance evaluation, and customized responses.Finally, through a multi-layer chunking mechanism, timestamps, and hallucination detection features, the accuracy and verifiability of the analysis results are ensured, reducing hallucination rates and enhancing the precision of the responses.