What Factors Affect LLMs and RLLMs in Financial Question Answering?

📅 2025-07-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study systematically investigates performance optimization pathways for large language models (LLMs) and reasoning-augmented LLMs (RLLMs) in financial question answering. Addressing the domain’s high specialization and multilingual requirements, we evaluate the impact of prompt engineering, agent frameworks, and multilingual alignment techniques across five LLMs and three RLLMs. Results show that chain-of-thought prompting significantly enhances LLM performance, whereas RLLMs—equipped with intrinsic reasoning capabilities—exhibit limited gains from conventional prompt optimization. Multilingual alignment improves LLM cross-lingual generalization primarily by extending reasoning paths, but yields negligible benefits for RLLMs. To our knowledge, this is the first work to empirically uncover divergent optimization mechanisms between LLMs and RLLMs in finance, establishing a “capability–method” matching principle. Our findings provide evidence-based guidance for domain-specific model selection and customization, advancing both practical deployment and methodological understanding.

Technology Category

Application Category

📝 Abstract
Recently, the development of large language models (LLMs) and reasoning large language models (RLLMs) have gained considerable attention from many researchers. RLLMs enhance the reasoning capabilities of LLMs through Long Chain-of-Thought (Long CoT) processes, significantly improving the performance of LLMs in addressing complex problems. However, there are few works that systematically explore what methods can fully unlock the performance of LLMs and RLLMs within the financial domain. To investigate the impact of various methods on LLMs and RLLMs, we utilize five LLMs and three RLLMs to assess the effects of prompting methods, agentic frameworks, and multilingual alignment methods on financial question-answering tasks. Our research findings indicate: (1) Current prompting methods and agent frameworks enhance the performance of LLMs in financial question answering by simulating Long CoT; (2) RLLMs possess inherent Long CoT capabilities, which limits the effectiveness of conventional methods in further enhancing their performance; (3) Current advanced multilingual alignment methods primarily improve the multilingual performance of LLMs by extending the reasoning length, which yields minimal benefits for RLLMs. We hope that this study can serve as an important reference for LLMs and RLLMs in the field of financial question answering.
Problem

Research questions and friction points this paper is trying to address.

Exploring methods to optimize LLMs and RLLMs in financial QA
Assessing prompting, agent frameworks, and multilingual alignment effects
Evaluating performance gaps between LLMs and RLLMs in finance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Utilizing Long CoT to enhance LLMs reasoning
Assessing prompting and agentic frameworks impact
Exploring multilingual alignment for performance improvement
🔎 Similar Papers
No similar papers found.
P
Peng Wang
School of Computer Science and Engineering, Macau University of Science and Technology, China
X
Xuesi Hu
School of Economics, Anhui University, China
Jiageng Wu
Jiageng Wu
Harvard University
Public healthDigital healthcare
Y
Yuntao Zou
School of Computer Science and Engineering, Macau University of Science and Technology, China and School of Energy and Power Engineering, Huazhong University of Science and Technology, China
Q
Qiancheng Zhang
School of Economics, Anhui University, China
Dagang Li
Dagang Li
Macau University of Science and Technology
NetworkGraphTime seriesRLLLM