Exploring the Hidden Reasoning Process of Large Language Models by Misleading Them

📅 2025-03-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates whether large language and vision models (LLMs/VLMs) possess genuine abstract reasoning capabilities—beyond superficial pattern matching—by probing their ability to internalize and generalize formal rules. Method: We introduce “Misleading Fine-Tuning” (MisFT), a novel paradigm that constructs counterfactual datasets violating mathematical axioms, thereby inducing models to learn incorrect rules. We systematically evaluate cross-task rule generalization across textual math word problems and image-based arithmetic expressions. Contribution/Results: Experiments reveal consistent transfer of learned erroneous rules to unseen tasks in both modalities, indicating an implicit two-stage mechanism: abstraction of structural representations followed by rule application. Crucially, this is the first study to use counterfactual rule learning as a diagnostic probe, providing empirical evidence that LLMs/VLMs support abstract reasoning and rule-level generalization beyond surface-level statistical correlations. Our approach establishes a new methodological framework for characterizing the inferential nature of foundation models.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) and Vision language models (VLMs) have been able to perform various forms of reasoning tasks in a wide range of scenarios, but are they truly engaging in task abstraction and rule-based reasoning beyond mere memorization and pattern matching? To answer this question, we propose a novel experimental approach, Misleading Fine-Tuning (MisFT), to examine whether LLMs/VLMs perform abstract reasoning by altering their original understanding of fundamental rules. In particular, by constructing a dataset with math expressions that contradict correct operation principles, we fine-tune the model to learn those contradictory rules and assess its generalization ability on different test domains. Through a series of experiments, we find that current LLMs/VLMs are capable of effectively applying contradictory rules to solve practical math word problems and math expressions represented by images, implying the presence of an internal mechanism that abstracts before reasoning.
Problem

Research questions and friction points this paper is trying to address.

Investigates if LLMs/VLMs perform abstract reasoning beyond memorization.
Proposes Misleading Fine-Tuning to test model reasoning with contradictory rules.
Assesses generalization ability of models on math problems and image-based expressions.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Misleading Fine-Tuning (MisFT) for abstract reasoning
Dataset with contradictory math expressions
Assessing generalization across different test domains
🔎 Similar Papers
No similar papers found.
G
Guanyu Chen
Department of Automation, Tsinghua University
P
Peiyang Wang
Department of Automation, Tsinghua University
Tianren Zhang
Tianren Zhang
Tsinghua University
Representation learningGeneralizationLearning theoryReinforcement learningMachine learning
F
Feng Chen
Department of Automation, Tsinghua University