🤖 AI Summary
This study addresses the challenge that foundational reasoning modes in large language models—inductive, deductive, and abductive—are tightly entangled with specific tasks, limiting controllable reasoning. By constructing reasoning-conflict scenarios, the work systematically investigates how models trade off between instruction compliance and task plausibility. It reveals, for the first time, a consistent model preference for plausibility over strict adherence to instructions. Furthermore, it demonstrates that distinct reasoning types occupy identifiable linear subspaces in internal representations, enabling precise intervention at the activation level. Integrating conflict-driven design, confidence analysis, representational probing, and activation manipulation, the proposed approach boosts instruction-following accuracy by up to 29%. The findings also indicate that model performance does not solely rely on reasoning plausibility; rather, larger models increasingly depend on parametric memory.
📝 Abstract
Large Language Models (LLMs) are known to acquire reasoning capabilities through shared inference patterns in pre-training data, which are further elicited via Chain-of-Thought (CoT) practices. However, whether fundamental reasoning patterns, such as induction, deduction, and abduction, can be decoupled from specific problem instances remains a critical challenge for model controllability, and for shedding light on reasoning controllability. In this paper, we present the first systematic investigation of this problem through the lens of reasoning conflicts: an explicit tension between parametric and contextual information induced by mandating logical schemata that deviate from those expected for a target task. Our evaluation reveals that LLMs consistently prioritize sensibility over compliance, favoring task-appropriate reasoning patterns despite conflicting instructions. Notably, task accuracy is not strictly determined by sensibility, with models often maintaining high performance even when using conflicting patterns, suggesting a reliance on internalized parametric memory that increases with model size. We further demonstrate that reasoning conflicts are internally detectable, as confidence scores significantly drop during conflicting episodes. Probing experiments confirm that reasoning types are linearly encoded from middle-to-late layers, indicating the potential for activation-level controllability. Leveraging these insights, we steer models towards compliance, increasing instruction following by up to 29%. Overall, our findings establish that while LLM reasoning is anchored to concrete instances, active mechanistic interventions can effectively decouple logical schemata from data, offering a path toward improved controllability, faithfulness, and generalizability.