Exploring the Impact of Instruction-Tuning on LLM's Susceptibility to Misinformation

📅 2025-07-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates whether instruction tuning increases large language models’ (LLMs) susceptibility to misinformation. While prior work primarily examines conflicts within models’ internal knowledge, it overlooks how instruction tuning affects credibility assessment of user-provided information. To address this gap, we conduct controlled comparative experiments evaluating instruction-tuned models against their base counterparts when exposed to contradictory or false user inputs. Results demonstrate that instruction tuning significantly amplifies model reliance on user input, thereby increasing adoption of erroneous information; this effect is modulated by prompt structure, length of misleading content, and presence of system warnings. Our key contribution is the first empirical identification that instruction tuning shifts “misbelief risk” from the model’s parametric knowledge to user-supplied inputs. We further propose a reliability evaluation framework grounded in prompt robustness analysis, enabling systematic assessment of model trustworthiness under adversarial or ambiguous user prompts.

Technology Category

Application Category

📝 Abstract
Instruction-tuning enhances the ability of large language models (LLMs) to follow user instructions more accurately, improving usability while reducing harmful outputs. However, this process may increase the model's dependence on user input, potentially leading to the unfiltered acceptance of misinformation and the generation of hallucinations. Existing studies primarily highlight that LLMs are receptive to external information that contradict their parametric knowledge, but little research has been conducted on the direct impact of instruction-tuning on this phenomenon. In our study, we investigate the impact of instruction-tuning on LLM's susceptibility to misinformation. Our analysis reveals that instruction-tuned LLMs are significantly more likely to accept misinformation when it is presented by the user. A comparison with base models shows that instruction-tuning increases reliance on user-provided information, shifting susceptibility from the assistant role to the user role. Furthermore, we explore additional factors influencing misinformation susceptibility, such as the role of the user in prompt structure, misinformation length, and the presence of warnings in the system prompt. Our findings underscore the need for systematic approaches to mitigate unintended consequences of instruction-tuning and enhance the reliability of LLMs in real-world applications.
Problem

Research questions and friction points this paper is trying to address.

Investigates how instruction-tuning affects LLM's vulnerability to misinformation
Compares susceptibility between instruction-tuned and base LLM models
Explores factors influencing misinformation acceptance in instruction-tuned LLMs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Instruction-tuning increases LLM misinformation susceptibility
User role shifts misinformation reliance in tuned LLMs
Prompt structure and warnings influence misinformation acceptance
🔎 Similar Papers
No similar papers found.
K
Kyubeen Han
Konkuk University
J
Junseo Jang
Konkuk University
H
Hongjin Kim
ETRI
G
Geunyeong Jeong
Konkuk University
Harksoo Kim
Harksoo Kim
Professor of Computer Science and Engineering, Konkuk University
Natural Language ProcessingQuesion-AnswetingRelation ExtractionDialogue System