What Factors Affect LLMs and RLLMs in Financial Question Answering?

📅 2025-07-11

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

This study systematically investigates performance optimization pathways for large language models (LLMs) and reasoning-augmented LLMs (RLLMs) in financial question answering. Addressing the domain’s high specialization and multilingual requirements, we evaluate the impact of prompt engineering, agent frameworks, and multilingual alignment techniques across five LLMs and three RLLMs. Results show that chain-of-thought prompting significantly enhances LLM performance, whereas RLLMs—equipped with intrinsic reasoning capabilities—exhibit limited gains from conventional prompt optimization. Multilingual alignment improves LLM cross-lingual generalization primarily by extending reasoning paths, but yields negligible benefits for RLLMs. To our knowledge, this is the first work to empirically uncover divergent optimization mechanisms between LLMs and RLLMs in finance, establishing a “capability–method” matching principle. Our findings provide evidence-based guidance for domain-specific model selection and customization, advancing both practical deployment and methodological understanding.

Technology Category

Application Category

📝 Abstract

Recently, the development of large language models (LLMs) and reasoning large language models (RLLMs) have gained considerable attention from many researchers. RLLMs enhance the reasoning capabilities of LLMs through Long Chain-of-Thought (Long CoT) processes, significantly improving the performance of LLMs in addressing complex problems. However, there are few works that systematically explore what methods can fully unlock the performance of LLMs and RLLMs within the financial domain. To investigate the impact of various methods on LLMs and RLLMs, we utilize five LLMs and three RLLMs to assess the effects of prompting methods, agentic frameworks, and multilingual alignment methods on financial question-answering tasks. Our research findings indicate: (1) Current prompting methods and agent frameworks enhance the performance of LLMs in financial question answering by simulating Long CoT; (2) RLLMs possess inherent Long CoT capabilities, which limits the effectiveness of conventional methods in further enhancing their performance; (3) Current advanced multilingual alignment methods primarily improve the multilingual performance of LLMs by extending the reasoning length, which yields minimal benefits for RLLMs. We hope that this study can serve as an important reference for LLMs and RLLMs in the field of financial question answering.

Problem

Research questions and friction points this paper is trying to address.

Exploring methods to optimize LLMs and RLLMs in financial QA

Assessing prompting, agent frameworks, and multilingual alignment effects

Evaluating performance gaps between LLMs and RLLMs in finance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Utilizing Long CoT to enhance LLMs reasoning

Assessing prompting and agentic frameworks impact

Exploring multilingual alignment for performance improvement

🔎 Similar Papers

Evaluating LLMs' Mathematical Reasoning in Financial Document Question Answering