DianJin-R1: Evaluating and Enhancing Financial Reasoning in Large Language Models

📅 2025-04-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the weak reasoning capabilities of large language models (LLMs) in finance—particularly regarding domain-specific knowledge dependency, low numerical computation accuracy, and difficulty adhering to regulatory compliance rules—this work constructs a high-quality, multi-source financial reasoning dataset by integrating CFLUE, FinQA, and CCC. We propose a structured supervised fine-tuning paradigm and a novel dual-signal Group Relative Policy Optimization (GRPO) algorithm that jointly optimizes output format conformity and answer correctness. This work introduces the first structured reasoning supervision framework tailored for financial applications, achieving single-forward-pass inference performance comparable to costly multi-agent systems. Our method outperforms non-reasoning baselines across five major benchmarks—including CFLUE, FinQA, and CCC—and achieves 92.3% accuracy on real-world regulatory compliance tasks in CCC, matching state-of-the-art multi-agent system performance.

Technology Category

Application Category

📝 Abstract
Effective reasoning remains a core challenge for large language models (LLMs) in the financial domain, where tasks often require domain-specific knowledge, precise numerical calculations, and strict adherence to compliance rules. We propose DianJin-R1, a reasoning-enhanced framework designed to address these challenges through reasoning-augmented supervision and reinforcement learning. Central to our approach is DianJin-R1-Data, a high-quality dataset constructed from CFLUE, FinQA, and a proprietary compliance corpus (Chinese Compliance Check, CCC), combining diverse financial reasoning scenarios with verified annotations. Our models, DianJin-R1-7B and DianJin-R1-32B, are fine-tuned from Qwen2.5-7B-Instruct and Qwen2.5-32B-Instruct using a structured format that generates both reasoning steps and final answers. To further refine reasoning quality, we apply Group Relative Policy Optimization (GRPO), a reinforcement learning method that incorporates dual reward signals: one encouraging structured outputs and another rewarding answer correctness. We evaluate our models on five benchmarks: three financial datasets (CFLUE, FinQA, and CCC) and two general reasoning benchmarks (MATH-500 and GPQA-Diamond). Experimental results show that DianJin-R1 models consistently outperform their non-reasoning counterparts, especially on complex financial tasks. Moreover, on the real-world CCC dataset, our single-call reasoning models match or even surpass the performance of multi-agent systems that require significantly more computational cost. These findings demonstrate the effectiveness of DianJin-R1 in enhancing financial reasoning through structured supervision and reward-aligned learning, offering a scalable and practical solution for real-world applications.
Problem

Research questions and friction points this paper is trying to address.

Enhancing financial reasoning in large language models
Addressing domain-specific knowledge and numerical calculations
Ensuring compliance with strict financial rules
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reasoning-augmented supervision and reinforcement learning
High-quality dataset combining diverse financial scenarios
Group Relative Policy Optimization with dual rewards
🔎 Similar Papers
No similar papers found.
J
Jie Zhu
Qwen DianJin Team, Alibaba Cloud Computing
Q
Qian Chen
Qwen DianJin Team, Alibaba Cloud Computing
H
Huaixia Dou
Qwen DianJin Team, Alibaba Cloud Computing; Soochow University
J
Junhui Li
Soochow University
Lifan Guo
Lifan Guo
Researcher Drexel University
Machine Learning
F
Feng Chen
Qwen DianJin Team, Alibaba Cloud Computing
C
Chi Zhang
Qwen DianJin Team, Alibaba Cloud Computing