🤖 AI Summary
This study investigates how misleading information—arising from user oversights or knowledge gaps—in input prompts propagates through large language models’ (LLMs) mathematical reasoning chains, degrading intermediate steps and final answers. We systematically construct misleading inputs, design explicit error-correction prompting experiments, and integrate synthetic data generation with supervised fine-tuning (SFT). Our work is the first to empirically characterize the cascading diffusion of misinformation in LLM reasoning: erroneous premises rapidly solidify in early reasoning steps, contaminating subsequent deductions; explicit correction instructions succeed in fewer than 50% of cases, reducing answer accuracy by 10.02–72.20%. Crucially, we find that factual correction at the earliest reasoning stage yields maximal mitigation. Leveraging this insight, we propose an SFT approach trained exclusively on early-stage correction data, which significantly improves reasoning factuality and substantially reduces error rates.
📝 Abstract
Large Language Models (LLMs) have demonstrated impressive capabilities in reasoning, positioning them as promising tools for supporting human problem-solving. However, what happens when their performance is affected by misinformation, i.e., incorrect inputs introduced by users due to oversights or gaps in knowledge? Such misinformation is prevalent in real-world interactions with LLMs, yet how it propagates within LLMs' reasoning process remains underexplored. Focusing on mathematical reasoning, we present a comprehensive analysis of how misinformation affects intermediate reasoning steps and final answers. We also examine how effectively LLMs can correct misinformation when explicitly instructed to do so. Even with explicit instructions, LLMs succeed less than half the time in rectifying misinformation, despite possessing correct internal knowledge, leading to significant accuracy drops (10.02% - 72.20%). Further analysis shows that applying factual corrections early in the reasoning process most effectively reduces misinformation propagation, and fine-tuning on synthesized data with early-stage corrections significantly improves reasoning factuality. Our work offers a practical approach to mitigating misinformation propagation.