Compliance versus Sensibility: On the Reasoning Controllability in Large Language Models

📅 2026-04-29

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

This study addresses the challenge that foundational reasoning modes in large language models—inductive, deductive, and abductive—are tightly entangled with specific tasks, limiting controllable reasoning. By constructing reasoning-conflict scenarios, the work systematically investigates how models trade off between instruction compliance and task plausibility. It reveals, for the first time, a consistent model preference for plausibility over strict adherence to instructions. Furthermore, it demonstrates that distinct reasoning types occupy identifiable linear subspaces in internal representations, enabling precise intervention at the activation level. Integrating conflict-driven design, confidence analysis, representational probing, and activation manipulation, the proposed approach boosts instruction-following accuracy by up to 29%. The findings also indicate that model performance does not solely rely on reasoning plausibility; rather, larger models increasingly depend on parametric memory.

📝 Abstract

Large Language Models (LLMs) are known to acquire reasoning capabilities through shared inference patterns in pre-training data, which are further elicited via Chain-of-Thought (CoT) practices. However, whether fundamental reasoning patterns, such as induction, deduction, and abduction, can be decoupled from specific problem instances remains a critical challenge for model controllability, and for shedding light on reasoning controllability. In this paper, we present the first systematic investigation of this problem through the lens of reasoning conflicts: an explicit tension between parametric and contextual information induced by mandating logical schemata that deviate from those expected for a target task. Our evaluation reveals that LLMs consistently prioritize sensibility over compliance, favoring task-appropriate reasoning patterns despite conflicting instructions. Notably, task accuracy is not strictly determined by sensibility, with models often maintaining high performance even when using conflicting patterns, suggesting a reliance on internalized parametric memory that increases with model size. We further demonstrate that reasoning conflicts are internally detectable, as confidence scores significantly drop during conflicting episodes. Probing experiments confirm that reasoning types are linearly encoded from middle-to-late layers, indicating the potential for activation-level controllability. Leveraging these insights, we steer models towards compliance, increasing instruction following by up to 29%. Overall, our findings establish that while LLM reasoning is anchored to concrete instances, active mechanistic interventions can effectively decouple logical schemata from data, offering a path toward improved controllability, faithfulness, and generalizability.

Problem

Research questions and friction points this paper is trying to address.

reasoning controllability

large language models

reasoning patterns

compliance

sensibility

Innovation

Methods, ideas, or system contributions that make the work stand out.

reasoning controllability

reasoning conflicts

Chain-of-Thought