Learning to Trade Like an Expert: Cognitive Fine-Tuning for Stable Financial Reasoning in Language Models

📅 2026-04-18

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

This work addresses the limited generalization capability of large language models in financial trading and the absence of reliable training and evaluation methodologies, particularly in noisy market environments lacking explicit labels. To this end, the authors introduce the first structured financial reasoning dataset that integrates textbook knowledge, historical market data, and AI-validated insights, alongside a two-stage evaluation framework based on temporal simulation. By incorporating cognitive fine-tuning, structured reasoning trajectories, AI committee validation, and multiple-choice-driven trading simulations, the approach effectively mitigates shortcut learning. The resulting open-source model demonstrates robust, risk-aware trading behavior across diverse market conditions, significantly outperforming existing open-source baselines and approaching the performance of state-of-the-art models despite its smaller scale.

Technology Category

Application Category

📝 Abstract

Recent deployments of large language models (LLMs) as autonomous trading agents raise questions about whether financial decision-making competence generalizes beyond specific market patterns and how it should be trained and evaluated in noisy markets lacking ground truth. We propose a structured framework for training and evaluating such models. Central to our approach is a curated, multiple-choice question (MCQ) dataset derived from classic textbooks and historical markets, verified by an AI committee, enriched with structured reasoning traces, and augmented to reduce shortcut learning. To evaluate whether performance on isolated MCQs generalizes to real-world trading, we introduce a two-stage protocol combining test-set evaluation with an MCQ-based chronological trading simulation. Extensive evaluations across market regimes provide statistically robust evidence that open models trained with our framework exhibit competitive, risk-aware behavior over time, outperform open-source baselines, and approach frontier-model performance at smaller scale. We release the dataset and evaluation framework to support further research.

Problem

Research questions and friction points this paper is trying to address.

financial reasoning

language models

trading agents

generalization

evaluation framework

Innovation

Methods, ideas, or system contributions that make the work stand out.

cognitive fine-tuning

structured reasoning traces

shortcut learning mitigation