Instruction Data Selection via Answer Divergence

📅 2026-04-12
📈 Citations: 0
Influential: 0
📄 PDF

career value

208K/year
🤖 AI Summary
This work addresses the challenge of efficiently selecting high-quality samples from large-scale instruction-response datasets to enhance instruction tuning. It proposes Answer Divergence-Guided Selection (ADG), a novel method that leverages the geometric divergence of model-generated answers in semantic embedding space as a selection signal. By quantifying the dispersion and anisotropy of distributions derived from multiple sampled responses, ADG identifies instructions that yield informative and non-redundant outputs. Remarkably, fine-tuning with only 10K ADG-selected samples consistently outperforms strong existing baselines across six benchmarks spanning reasoning, knowledge, and code generation tasks, demonstrating the effectiveness and novelty of answer divergence as a data selection criterion.

Technology Category

Application Category

📝 Abstract
Instruction tuning relies on large instruction-response corpora whose quality and composition strongly affect downstream performance. We propose Answer Divergence-Guided Selection (ADG), which selects instruction data based on the geometric structure of multi-sample outputs. ADG draws several high-temperature generations per instruction, maps responses into an embedding space, and computes an output divergence score that jointly encodes dispersion magnitude and shape anisotropy. High scores correspond to instructions whose answers are both far apart and multi-modal, rather than clustered paraphrases along a single direction. Across two backbones and three public instruction pools, fine-tuning on only 10K ADG-selected examples consistently outperforms strong selectors on six benchmarks spanning reasoning, knowledge, and coding. Analyses further show that both dispersion magnitude and shape anisotropy are necessary, supporting answer divergence as a practical signal for instruction data selection. Code and appendix are included in the supplementary materials.
Problem

Research questions and friction points this paper is trying to address.

instruction tuning
data selection
answer divergence
instruction-response corpora
downstream performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Answer Divergence
Instruction Data Selection
Embedding Space
Multi-modal Outputs
Instruction Tuning
🔎 Similar Papers