🤖 AI Summary
This work addresses the inherent interference in existing unified language models when switching between intuitive (System 1) and deliberative (System 2) cognitive modes, a limitation not resolved by merely adjusting output length. To overcome this, we propose DAMI (Dynamic Parameter Interpolation), a framework that dynamically modulates a model’s cognitive capacity—rather than its output content—through preference learning and confidence-driven, zero-shot estimation of reasoning intensity. DAMI enables seamless, training-free transitions between Instruct- and Thinking-style models. Evaluated on five mathematical reasoning benchmarks, our method surpasses pure Thinking models in accuracy while retaining System 1–level inference efficiency, achieving the first Pareto-optimal trade-off between precision and speed. This establishes a new paradigm shifting from “output control” to “capability control.”
📝 Abstract
Training a unified language model that adapts between intuitive System 1 and deliberative System 2 remains challenging due to interference between their cognitive modes. Recent studies have thus pursued making System 2 models more efficient. However, these approaches focused on output control, limiting what models produce. We argue that this paradigm is misaligned: output length is merely a symptom of the model's cognitive configuration, not the root cause. In this work, we shift the focus to capability control, which modulates \textit{how models think} rather than \textit{what they produce}. To realize this, we leverage existing Instruct and Thinking checkpoints through dynamic parameter interpolation, without additional training. Our pilot study establishes that linear interpolation yields a convex, monotonic Pareto frontier, underpinned by representation continuity and structural connectivity. Building on this, we propose \textbf{DAMI} (\textbf{D}yn\textbf{A}mic \textbf{M}odel \textbf{I}nterpolation), a framework that estimates a query-specific Reasoning Intensity $\lambda(q)$ to configure cognitive depth. For training-based estimation, we develop a preference learning method encoding accuracy and efficiency criteria. For zero-shot deployment, we introduce a confidence-based method leveraging inter-model cognitive discrepancy. Experiments on five mathematical reasoning benchmarks demonstrate that DAMI achieves higher accuracy than the Thinking model while remaining efficient, effectively combining the efficiency of System 1 with the reasoning depth of System 2.