Exploring the Impact of Instruction-Tuning on LLM's Susceptibility to Misinformation

📅 2025-07-24

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

This study investigates whether instruction tuning increases large language models’ (LLMs) susceptibility to misinformation. While prior work primarily examines conflicts within models’ internal knowledge, it overlooks how instruction tuning affects credibility assessment of user-provided information. To address this gap, we conduct controlled comparative experiments evaluating instruction-tuned models against their base counterparts when exposed to contradictory or false user inputs. Results demonstrate that instruction tuning significantly amplifies model reliance on user input, thereby increasing adoption of erroneous information; this effect is modulated by prompt structure, length of misleading content, and presence of system warnings. Our key contribution is the first empirical identification that instruction tuning shifts “misbelief risk” from the model’s parametric knowledge to user-supplied inputs. We further propose a reliability evaluation framework grounded in prompt robustness analysis, enabling systematic assessment of model trustworthiness under adversarial or ambiguous user prompts.

Technology Category

Application Category

📝 Abstract

Instruction-tuning enhances the ability of large language models (LLMs) to follow user instructions more accurately, improving usability while reducing harmful outputs. However, this process may increase the model's dependence on user input, potentially leading to the unfiltered acceptance of misinformation and the generation of hallucinations. Existing studies primarily highlight that LLMs are receptive to external information that contradict their parametric knowledge, but little research has been conducted on the direct impact of instruction-tuning on this phenomenon. In our study, we investigate the impact of instruction-tuning on LLM's susceptibility to misinformation. Our analysis reveals that instruction-tuned LLMs are significantly more likely to accept misinformation when it is presented by the user. A comparison with base models shows that instruction-tuning increases reliance on user-provided information, shifting susceptibility from the assistant role to the user role. Furthermore, we explore additional factors influencing misinformation susceptibility, such as the role of the user in prompt structure, misinformation length, and the presence of warnings in the system prompt. Our findings underscore the need for systematic approaches to mitigate unintended consequences of instruction-tuning and enhance the reliability of LLMs in real-world applications.

Problem

Research questions and friction points this paper is trying to address.

Investigates how instruction-tuning affects LLM's vulnerability to misinformation

Compares susceptibility between instruction-tuned and base LLM models

Explores factors influencing misinformation acceptance in instruction-tuned LLMs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Instruction-tuning increases LLM misinformation susceptibility

User role shifts misinformation reliance in tuned LLMs

Prompt structure and warnings influence misinformation acceptance

🔎 Similar Papers

No similar papers found.