QuantEval: A Benchmark for Financial Quantitative Tasks in Large Language Models

📅 2026-01-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limited evaluation of large language models (LLMs) in financial quantitative tasks, which has predominantly focused on knowledge-based question answering and fails to capture genuine quantitative reasoning and strategy implementation capabilities. To bridge this gap, we propose QuantEval, a comprehensive benchmark that systematically assesses models across three dimensions: financial knowledge, quantitative mathematical reasoning, and strategy coding. For the first time, QuantEval integrates an executable CTA-style backtesting framework and standard financial performance metrics into the evaluation pipeline, enabling realistic and reproducible assessment of quantitative skills. Leveraging a deterministic backtesting environment—complete with a defined asset universe, transaction costs, and standardized metrics—we fine-tune models via supervised learning and reinforcement learning on domain-specific data. Experiments reveal that current state-of-the-art models significantly underperform human experts in reasoning and strategy generation, yet our approach markedly improves their performance on QuantEval.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have shown strong capabilities across many domains, yet their evaluation in financial quantitative tasks remains fragmented and mostly limited to knowledge-centric question answering. We introduce QuantEval, a benchmark that evaluates LLMs across three essential dimensions of quantitative finance: knowledge-based QA, quantitative mathematical reasoning, and quantitative strategy coding. Unlike prior financial benchmarks, QuantEval integrates a CTA-style backtesting framework that executes model-generated strategies and evaluates them using financial performance metrics, enabling a more realistic assessment of quantitative coding ability. We evaluate some state-of-the-art open-source and proprietary LLMs and observe substantial gaps to human experts, particularly in reasoning and strategy coding. Finally, we conduct large-scale supervised fine-tuning and reinforcement learning experiments on domain-aligned data, demonstrating consistent improvements. We hope QuantEval will facilitate research on LLMs'quantitative finance capabilities and accelerate their practical adoption in real-world trading workflows. We additionally release the full deterministic backtesting configuration (asset universe, cost model, and metric definitions) to ensure strict reproducibility.
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
Quantitative Finance
Benchmark
Strategy Coding
Financial Evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

QuantEval
quantitative finance benchmark
backtesting framework
LLM evaluation
strategy coding
🔎 Similar Papers
2024-02-17Annual Meeting of the Association for Computational LinguisticsCitations: 26
Z
Zhaolu Kang
Peking University
J
Junhao Gong
Peking University
Wenqing Hu
Wenqing Hu
Missouri University of Science and Technology (formerly University of Missouri, Rolla)
ProbabilityApplied Mathematics
S
Shuo Yin
Tsinghua University
K
Kehan Jiang
Peking University
Zhicheng Fang
Zhicheng Fang
Shanghai Qi Zhi Institute
AI SafetyNatural Language ProcessingComputer Vision
Y
Yingjie He
Peking University
Chunlei Meng
Chunlei Meng
Fudan University
Embodied Ai,Multimodal,Multi-agent
R
Rong Fu
University of Macau
D
Dongyang Chen
Tsinghua University
L
Leqi Zheng
Tsinghua University
E
Eric Hanchen Jiang
University of California, Los Angeles
Y
Yunfei Feng
Shanghai Jiao Tong University
Y
Yitong Leng
Imperial College London
J
Junfan Zhu
University of Chicago
X
Xiaoyou Chen
Shanghai Weina Software Technology
X
Xi Yang
Beijing Academy of Aritificial Intelligent
R
Richeng Xuan
Peking University