Exploring the Hidden Reasoning Process of Large Language Models by Misleading Them

📅 2025-03-20

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

This work investigates whether large language and vision models (LLMs/VLMs) possess genuine abstract reasoning capabilities—beyond superficial pattern matching—by probing their ability to internalize and generalize formal rules. Method: We introduce “Misleading Fine-Tuning” (MisFT), a novel paradigm that constructs counterfactual datasets violating mathematical axioms, thereby inducing models to learn incorrect rules. We systematically evaluate cross-task rule generalization across textual math word problems and image-based arithmetic expressions. Contribution/Results: Experiments reveal consistent transfer of learned erroneous rules to unseen tasks in both modalities, indicating an implicit two-stage mechanism: abstraction of structural representations followed by rule application. Crucially, this is the first study to use counterfactual rule learning as a diagnostic probe, providing empirical evidence that LLMs/VLMs support abstract reasoning and rule-level generalization beyond surface-level statistical correlations. Our approach establishes a new methodological framework for characterizing the inferential nature of foundation models.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) and Vision language models (VLMs) have been able to perform various forms of reasoning tasks in a wide range of scenarios, but are they truly engaging in task abstraction and rule-based reasoning beyond mere memorization and pattern matching? To answer this question, we propose a novel experimental approach, Misleading Fine-Tuning (MisFT), to examine whether LLMs/VLMs perform abstract reasoning by altering their original understanding of fundamental rules. In particular, by constructing a dataset with math expressions that contradict correct operation principles, we fine-tune the model to learn those contradictory rules and assess its generalization ability on different test domains. Through a series of experiments, we find that current LLMs/VLMs are capable of effectively applying contradictory rules to solve practical math word problems and math expressions represented by images, implying the presence of an internal mechanism that abstracts before reasoning.

Problem

Research questions and friction points this paper is trying to address.

Investigates if LLMs/VLMs perform abstract reasoning beyond memorization.

Proposes Misleading Fine-Tuning to test model reasoning with contradictory rules.

Assesses generalization ability of models on math problems and image-based expressions.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Misleading Fine-Tuning (MisFT) for abstract reasoning

Dataset with contradictory math expressions

Assessing generalization across different test domains

🔎 Similar Papers

No similar papers found.