🤖 AI Summary
This work addresses the dual challenges of data scarcity and lack of standardized evaluation in automated equity research report (ERR) generation. We formally define, for the first time, the end-to-end ERR generation task. Our method comprises: (1) an automated pipeline integrating seven heterogeneous financial data sources to construct high-quality training data; (2) the release of FinRpt—the first open-source benchmark for ERR generation—featuring a curated dataset and a multidimensional evaluation framework covering 11 quantitative and qualitative metrics; and (3) FinRpt-Gen, a novel LLM-based multi-agent system tailored for ERR generation, combining supervised fine-tuning with reinforcement learning for both report synthesis and self-assessment. Experiments demonstrate that FinRpt exhibits high data fidelity and metric validity, while FinRpt-Gen achieves statistically significant improvements over strong baselines across multiple evaluation dimensions. All code and data are publicly released.
📝 Abstract
While LLMs have shown great success in financial tasks like stock prediction and question answering, their application in fully automating Equity Research Report generation remains uncharted territory. In this paper, we formulate the Equity Research Report (ERR) Generation task for the first time. To address the data scarcity and the evaluation metrics absence, we present an open-source evaluation benchmark for ERR generation - FinRpt. We frame a Dataset Construction Pipeline that integrates 7 financial data types and produces a high-quality ERR dataset automatically, which could be used for model training and evaluation. We also introduce a comprehensive evaluation system including 11 metrics to assess the generated ERRs. Moreover, we propose a multi-agent framework specifically tailored to address this task, named FinRpt-Gen, and train several LLM-based agents on the proposed datasets using Supervised Fine-Tuning and Reinforcement Learning. Experimental results indicate the data quality and metrics effectiveness of the benchmark FinRpt and the strong performance of FinRpt-Gen, showcasing their potential to drive innovation in the ERR generation field. All code and datasets are publicly available.