The Model Says Walk: How Surface Heuristics Override Implicit Constraints in LLM Reasoning

📅 2026-03-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the systematic failure of large language models when surface-level cues conflict with implicit feasibility constraints. The authors propose a “diagnose–measure–bridge–intervene” framework to systematically characterize the heuristic override phenomenon and introduce the Heuristic Override Benchmark (HOB), comprising 500 instances, which reveals models’ overreliance on superficial cues and conservative biases. Leveraging causal behavioral analysis, token-level attribution, parametric probing, and minimal pair designs, the study demonstrates that 14 mainstream models exhibit substantial performance degradation under constraint-conflicting scenarios, with peak accuracy reaching only 75%. Strategic interventions—particularly goal-decomposition prompting—yield consistent improvements of 6–9 percentage points, while a minimal prompting strategy achieves gains of approximately 15 percentage points.
📝 Abstract
Large language models systematically fail when a salient surface cue conflicts with an unstated feasibility constraint. We study this through a diagnose-measure-bridge-treat framework. Causal-behavioral analysis of the ``car wash problem'' across six models reveals approximately context-independent sigmoid heuristics: the distance cue exerts 8.7 to 38 times more influence than the goal, and token-level attribution shows patterns more consistent with keyword associations than compositional inference. The Heuristic Override Benchmark (HOB) -- 500 instances spanning 4 heuristic by 5 constraint families with minimal pairs and explicitness gradients -- demonstrates generality across 14 models: under strict evaluation (10/10 correct), no model exceeds 75%, and presence constraints are hardest (44%). A minimal hint (e.g., emphasizing the key object) recovers +15 pp on average, suggesting the failure lies in constraint inference rather than missing knowledge; 12/14 models perform worse when the constraint is removed (up to -39 pp), revealing conservative bias. Parametric probes confirm that the sigmoid pattern generalizes to cost, efficiency, and semantic-similarity heuristics; goal-decomposition prompting recovers +6 to 9 pp by forcing models to enumerate preconditions before answering. Together, these results characterize heuristic override as a systematic reasoning vulnerability and provide a benchmark for measuring progress toward resolving it.
Problem

Research questions and friction points this paper is trying to address.

heuristic override
implicit constraints
LLM reasoning
surface cues
reasoning failure
Innovation

Methods, ideas, or system contributions that make the work stand out.

heuristic override
reasoning vulnerability
constraint inference
attribution analysis
prompting intervention
🔎 Similar Papers
No similar papers found.
Yubo Li
Yubo Li
Carnegie Mellon University
AI in HealthcareLarge Language ModelsAI Alignment
L
Lu Zhang
Independent Researcher
T
Tianchong Jiang
Independent Researcher
Ramayya Krishnan
Ramayya Krishnan
Carnegie Mellon University
Network AnalysisE-commercePrivacy
R
Rema Padman
Carnegie Mellon University