CALM Before the STORM: Unlocking Native Reasoning for Optimization Modeling

📅 2025-10-05

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

Existing domain adaptation methods fail to fully unlock the native reasoning capabilities of large reasoning models (LRMs), and direct fine-tuning on non-reflective data yields limited improvements. To address this, we propose a lightweight “self-calibration” paradigm: corrective prompts guide the model to dynamically synthesize high-quality training data *within its own reasoning traces*, enhancing native reasoning chains with minimal token-level modifications (<2.6%). Our method integrates supervised fine-tuning and reinforcement learning—domain experts identify reasoning flaws and provide concise correction signals, enabling progressive self-improvement. Built upon this framework, the 4B-parameter STORM model achieves an average accuracy of 68.9% across five mainstream optimization modeling paradigms—matching the performance of a 671B-parameter model. This significantly improves both adaptation efficiency and generalization capability for small-scale LRMs.

Technology Category

Application Category

📝 Abstract

Large Reasoning Models (LRMs) have demonstrated strong capabilities in complex multi-step reasoning, opening new opportunities for automating optimization modeling. However, existing domain adaptation methods, originally designed for earlier instruction-tuned models, often fail to exploit the advanced reasoning patterns of modern LRMs -- In particular, we show that direct fine-tuning on traditional extit{non-reflective} datasets leads to limited gains. To fully leverage LRMs' inherent reasoning abilities, we propose extbf{CALM} ( extit{Corrective Adaptation with Lightweight Modification}), a framework that progressively refines LRMs within their native reasoning modes for optimization modeling tasks. In CALM, an expert intervener identifies reasoning flaws and provides concise corrective hints, which the LRM incorporates to produce improved reasoning trajectories. These interventions modify fewer than 2.6% of generated tokens, but generate high-quality data for soft adaptation through supervised fine-tuning. The adapted model is then further improved through reinforcement learning. Building on CALM, we develop extbf{STORM} ( extit{Smart Thinking Optimization Reasoning Model}), a 4B-parameter LRM that achieves a new state-of-the-art average accuracy of 68.9% across five popular optimization modeling benchmarks, matching the performance of a 671B LRM. These results demonstrate that dynamic, hint-based data synthesis both preserves and amplifies the native reasoning patterns of modern LRMs, offering a more effective and scalable path towards expert-level performance on challenging optimization modeling tasks.

Problem

Research questions and friction points this paper is trying to address.

Existing domain adaptation methods fail to exploit modern LRMs' advanced reasoning patterns

Direct fine-tuning on traditional datasets yields limited gains for optimization modeling

Need to preserve and amplify native reasoning patterns for expert-level optimization performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

CALM framework refines LRMs with corrective hints

STORM model achieves state-of-the-art optimization accuracy

Lightweight token modification preserves native reasoning patterns

🔎 Similar Papers

The Buffer Mechanism for Multi-Step Information Reasoning in Language Models