🤖 AI Summary
This work addresses the challenge of large language models (LLMs) failing to faithfully adhere to multi-section templates in financial report generation. We propose two information retrieval (IR)-enhanced frameworks: AgenticIR, which employs multi-agent collaborative retrieval, and DecomposedIR, a template-decomposition-based prompt chaining approach. We conduct the first systematic comparison of agent-based versus decomposition-based IR paradigms for structured financial text generation. Results show that DecomposedIR significantly improves both content coverage breadth and detail fidelity—achieving statistically significant gains over AgenticIR in both financial and meteorological reporting domains. While AgenticIR yields more concise outputs, it frequently omits critical subsections. Our contribution lies in integrating template-driven prompt chaining with a structured evaluation protocol—encompassing both reference-free and expert-reference evaluation scenarios—thereby challenging industry’s reliance on complex agent architectures and establishing a more efficient, controllable paradigm for structured financial text generation.
📝 Abstract
Tailoring structured financial reports from companies' earnings releases is crucial for understanding financial performance and has been widely adopted in real-world analytics. However, existing summarization methods often generate broad, high-level summaries, which may lack the precision and detail required for financial reports that typically focus on specific, structured sections. While Large Language Models (LLMs) hold promise, generating reports adhering to predefined multi-section templates remains challenging. This paper investigates two LLM-based approaches popular in industry for generating templated financial reports: an agentic information retrieval (IR) framework and a decomposed IR approach, namely AgenticIR and DecomposedIR. The AgenticIR utilizes collaborative agents prompted with the full template. In contrast, the DecomposedIR approach applies a prompt chaining workflow to break down the template and reframe each section as a query answered by the LLM using the earnings release. To quantitatively assess the generated reports, we evaluated both methods in two scenarios: one using a financial dataset without direct human references, and another with a weather-domain dataset featuring expert-written reports. Experimental results show that while AgenticIR may excel in orchestrating tasks and generating concise reports through agent collaboration, DecomposedIR statistically significantly outperforms AgenticIR approach in providing broader and more detailed coverage in both scenarios, offering reflection on the utilization of the agentic framework in real-world applications.