Automating Financial Statement Audits with Large Language Models

📅 2025-06-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Financial statement auditing faces practical bottlenecks including low efficiency and frequent oversight of errors. To address this, we introduce the first audit benchmark integrating real-world financial statement tables with synthetically generated transactional data, accompanied by a five-stage evaluation framework to systematically assess large language models (LLMs) across core auditing tasks: error detection, accounting standard tracing, and correction execution. We propose a novel, accounting-standard-compliance–oriented evaluation paradigm—mapping financial errors precisely to specific clauses of authoritative standards (e.g., IFRS or ASC) and establishing a reproducible, multi-dimensional LLM auditing capability benchmark. Experimental results show that state-of-the-art LLMs effectively detect reporting errors but exhibit significant limitations in interpreting standards, attributing errors to relevant provisions, and generating compliant corrections. This work provides a standardized testing baseline and methodological foundation for developing domain-specific auditing LLMs.

Technology Category

Application Category

📝 Abstract
Financial statement auditing is essential for stakeholders to understand a company's financial health, yet current manual processes are inefficient and error-prone. Even with extensive verification procedures, auditors frequently miss errors, leading to inaccurate financial statements that fail to meet stakeholder expectations for transparency and reliability. To this end, we harness large language models (LLMs) to automate financial statement auditing and rigorously assess their capabilities, providing insights on their performance boundaries in the scenario of automated auditing. Our work introduces a comprehensive benchmark using a curated dataset combining real-world financial tables with synthesized transaction data. In the benchmark, we developed a rigorous five-stage evaluation framework to assess LLMs' auditing capabilities. The benchmark also challenges models to map specific financial statement errors to corresponding violations of accounting standards, simulating real-world auditing scenarios through test cases. Our testing reveals that current state-of-the-art LLMs successfully identify financial statement errors when given historical transaction data. However, these models demonstrate significant limitations in explaining detected errors and citing relevant accounting standards. Furthermore, LLMs struggle to execute complete audits and make necessary financial statement revisions. These findings highlight a critical gap in LLMs' domain-specific accounting knowledge. Future research must focus on enhancing LLMs' understanding of auditing principles and procedures. Our benchmark and evaluation framework establish a foundation for developing more effective automated auditing tools that will substantially improve the accuracy and efficiency of real-world financial statement auditing.
Problem

Research questions and friction points this paper is trying to address.

Automating financial audits to improve efficiency and accuracy
Assessing LLMs' capabilities in identifying financial statement errors
Addressing gaps in LLMs' accounting knowledge and error explanation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automate auditing using large language models
Develop five-stage evaluation framework for LLMs
Benchmark combines real and synthetic financial data