Aligning Large Language Models to Follow Instructions and Hallucinate Less via Effective Data Filtering

📅 2025-02-11

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

Introducing unfamiliar knowledge during instruction tuning often induces overconfidence and hallucination in large language models (LLMs). To address this, we propose NOVA—a framework that suppresses hallucination while preserving strong instruction-following capability. Its core innovations are: (1) the first dual-mechanism approach—Internal Consistency Probing (ICP) and Semantic Equivalence Identification (SEI)—to quantify model familiarity with instruction-response pairs; and (2) an expert-aligned reward model integrating semantic clustering and multi-response voting for high-quality data selection. Evaluated across multiple benchmarks, NOVA reduces hallucination rates by an average of 32.7% and outperforms state-of-the-art data filtering methods in instruction-following performance. It is the first framework to enable robust instruction tuning driven explicitly by knowledge alignment.

Technology Category

Application Category

📝 Abstract

Training LLMs on data that contains unfamiliar knowledge during the instruction tuning stage can make LLMs overconfident and encourage hallucinations. To address this challenge, we introduce a novel framework, NOVA, which identifies high-quality data that aligns well with the LLM's learned knowledge to reduce hallucinations. NOVA includes Internal Consistency Probing (ICP) and Semantic Equivalence Identification (SEI) to measure how familiar the LLM is with instruction data. Specifically, ICP evaluates the LLM's understanding of the given instruction by calculating the tailored consistency among multiple self-generated responses. SEI further assesses the familiarity of the LLM with the target response by comparing it to the generated responses, using the proposed semantic clustering and well-designed voting strategy. Finally, we introduce an expert-aligned reward model, considering characteristics beyond just familiarity to enhance data quality. By considering data quality and avoiding unfamiliar data, we can utilize the selected data to effectively align LLMs to follow instructions and hallucinate less. Extensive experiments and analysis show that NOVA significantly reduces hallucinations and allows LLMs to maintain a strong ability to follow instructions.

Problem

Research questions and friction points this paper is trying to address.

Reduces hallucinations in Large Language Models

Filters unfamiliar data to improve instruction alignment

Enhances data quality via semantic equivalence identification

Innovation

Methods, ideas, or system contributions that make the work stand out.

Novel framework NOVA

Internal Consistency Probing

Semantic Equivalence Identification

🔎 Similar Papers

No similar papers found.