🤖 AI Summary
Introducing unfamiliar knowledge during instruction tuning often induces overconfidence and hallucination in large language models (LLMs). To address this, we propose NOVA—a framework that suppresses hallucination while preserving strong instruction-following capability. Its core innovations are: (1) the first dual-mechanism approach—Internal Consistency Probing (ICP) and Semantic Equivalence Identification (SEI)—to quantify model familiarity with instruction-response pairs; and (2) an expert-aligned reward model integrating semantic clustering and multi-response voting for high-quality data selection. Evaluated across multiple benchmarks, NOVA reduces hallucination rates by an average of 32.7% and outperforms state-of-the-art data filtering methods in instruction-following performance. It is the first framework to enable robust instruction tuning driven explicitly by knowledge alignment.
📝 Abstract
Training LLMs on data that contains unfamiliar knowledge during the instruction tuning stage can make LLMs overconfident and encourage hallucinations. To address this challenge, we introduce a novel framework, NOVA, which identifies high-quality data that aligns well with the LLM's learned knowledge to reduce hallucinations. NOVA includes Internal Consistency Probing (ICP) and Semantic Equivalence Identification (SEI) to measure how familiar the LLM is with instruction data. Specifically, ICP evaluates the LLM's understanding of the given instruction by calculating the tailored consistency among multiple self-generated responses. SEI further assesses the familiarity of the LLM with the target response by comparing it to the generated responses, using the proposed semantic clustering and well-designed voting strategy. Finally, we introduce an expert-aligned reward model, considering characteristics beyond just familiarity to enhance data quality. By considering data quality and avoiding unfamiliar data, we can utilize the selected data to effectively align LLMs to follow instructions and hallucinate less. Extensive experiments and analysis show that NOVA significantly reduces hallucinations and allows LLMs to maintain a strong ability to follow instructions.