🤖 AI Summary
Large language models (LLMs) exhibit rigid inference behavior, lacking the human-like dynamic switching between System 1 (intuitive) and System 2 (analytic) cognitive styles, resulting in poor generalization.
Method: We propose the first explicit framework that decouples and models dual-system reasoning in LLMs, constructing an interpolatable, interpretable continuous reasoning spectrum. Leveraging a 2,000-sample dual-answer annotated dataset, we integrate supervised fine-tuning, response mechanism analysis, uncertainty quantification, and style interpolation.
Contribution/Results: Our analysis uncovers fundamental accuracy–efficiency trade-off patterns. Empirical evaluation shows System 2-specialized models improve arithmetic and symbolic reasoning by 12.3%, while System 1-specialized models boost commonsense reasoning accuracy by 8.7%. Crucially, interpolation preserves reasoning coherence and ensures monotonic performance variation across the spectrum.
📝 Abstract
Large Language Models (LLMs) exhibit impressive reasoning abilities, yet their reliance on structured step-by-step processing reveals a critical limitation. While human cognition fluidly adapts between intuitive, heuristic (System 1) and analytical, deliberative (System 2) reasoning depending on the context, LLMs lack this dynamic flexibility. This rigidity can lead to brittle and unreliable performance when faced with tasks that deviate from their trained patterns. To address this, we create a dataset of 2,000 samples with valid System 1 and System 2 answers, explicitly align LLMs with these reasoning styles, and evaluate their performance across reasoning benchmarks. Our results reveal an accuracy-efficiency trade-off: System 2-aligned models excel in arithmetic and symbolic reasoning, while System 1-aligned models perform better in commonsense tasks. A mechanistic analysis of model responses shows that System 1 models employ more definitive answers, whereas System 2 models demonstrate greater uncertainty. Interpolating between these extremes produces a monotonic transition in reasoning accuracy, preserving coherence. This work challenges the assumption that step-by-step reasoning is always optimal and highlights the need for adapting reasoning strategies based on task demands.