QuantEval: A Benchmark for Financial Quantitative Tasks in Large Language Models

📅 2026-01-13

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

This work addresses the limited evaluation of large language models (LLMs) in financial quantitative tasks, which has predominantly focused on knowledge-based question answering and fails to capture genuine quantitative reasoning and strategy implementation capabilities. To bridge this gap, we propose QuantEval, a comprehensive benchmark that systematically assesses models across three dimensions: financial knowledge, quantitative mathematical reasoning, and strategy coding. For the first time, QuantEval integrates an executable CTA-style backtesting framework and standard financial performance metrics into the evaluation pipeline, enabling realistic and reproducible assessment of quantitative skills. Leveraging a deterministic backtesting environment—complete with a defined asset universe, transaction costs, and standardized metrics—we fine-tune models via supervised learning and reinforcement learning on domain-specific data. Experiments reveal that current state-of-the-art models significantly underperform human experts in reasoning and strategy generation, yet our approach markedly improves their performance on QuantEval.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have shown strong capabilities across many domains, yet their evaluation in financial quantitative tasks remains fragmented and mostly limited to knowledge-centric question answering. We introduce QuantEval, a benchmark that evaluates LLMs across three essential dimensions of quantitative finance: knowledge-based QA, quantitative mathematical reasoning, and quantitative strategy coding. Unlike prior financial benchmarks, QuantEval integrates a CTA-style backtesting framework that executes model-generated strategies and evaluates them using financial performance metrics, enabling a more realistic assessment of quantitative coding ability. We evaluate some state-of-the-art open-source and proprietary LLMs and observe substantial gaps to human experts, particularly in reasoning and strategy coding. Finally, we conduct large-scale supervised fine-tuning and reinforcement learning experiments on domain-aligned data, demonstrating consistent improvements. We hope QuantEval will facilitate research on LLMs'quantitative finance capabilities and accelerate their practical adoption in real-world trading workflows. We additionally release the full deterministic backtesting configuration (asset universe, cost model, and metric definitions) to ensure strict reproducibility.

Problem

Research questions and friction points this paper is trying to address.

Large Language Models

Quantitative Finance

Benchmark

Strategy Coding

Financial Evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

QuantEval

quantitative finance benchmark

backtesting framework