Fine-Tuning Small Reasoning Models for Quantum Field Theory

📅 2026-04-20
📈 Citations: 0
Influential: 0
📄 PDF

career value

210K/year
🤖 AI Summary
This study addresses the lack of systematic investigation into the reasoning capabilities of small-scale language models in theoretical physics domains such as quantum field theory, compounded by the scarcity of high-quality, verifiable domain-specific data. The work presents the first supervised fine-tuning (SFT) and reinforcement learning (RL) experiments on 7B-parameter reasoning models, employing a hybrid dataset combining synthetically generated and human-adapted examples. It introduces a reproducible pipeline for data generation and adaptation, releasing approximately 200 million tokens of reasoning trajectories and over 2,500 high-quality physics problems. Analysis of reasoning chains before and after training reveals substantial performance gains on quantum field theory tasks and demonstrates promising generalization to other areas of theoretical physics.

Technology Category

Application Category

📝 Abstract
Despite the growing application of Large Language Models (LLMs) to theoretical physics, there is little academic exploration into how domain-specific physics reasoning ability develops while training these models. To investigate this, we perform the first academic fine-tuning study of small (7B-parameter) reasoning models dedicated specifically to theoretical physics. Because open-source verifiable training data required to train such capabilities is scarce, we developed a robust data generation pipeline that can both create synthetic problems and make existing human-authored problems suitable for model training. Selecting Quantum Field Theory (QFT) as our primary domain, we generated over 2,500 synthetic problems alongside a curated collection of human-adapted problems sourced from arXiv and standard pedagogical resources. We conduct both Reinforcement Learning (RL) and Supervised Fine-Tuning (SFT) experiments, benchmarking performance gains as well as generalization to other physics domains. We perform an extensive analysis of model chains-of-though before and after fine-tuning, to understand how reasoning errors evolve during RL and SFT. Finally, we publicly release our data pipeline, verifiable QFT training data, and $\sim$200M tokens of QFT reasoning traces.
Problem

Research questions and friction points this paper is trying to address.

Quantum Field Theory
Reasoning Models
Fine-Tuning
Training Data Scarcity
Domain-Specific Reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

fine-tuning
reasoning models
Quantum Field Theory
synthetic data generation
chain-of-thought analysis