TACOS: Open Tagging and Comparative Scoring for Instruction Fine-Tuning Data Selection

📅 2025-07-04

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

In instruction fine-tuning (IFT), data selection faces two key bottlenecks: heuristic filtering compromises dataset diversity, while per-sample absolute scoring leads to inconsistent quality evaluation. To address these, we propose TACOS—a principled data selection framework. Its core contributions are: (1) integrating open-ended annotation with automated denoising and normalization, coupled with semantic clustering, to preserve subset diversity; and (2) introducing a pairwise relative scoring mechanism within clusters—replacing absolute scoring with comparative judgments—to enhance consistency in quality assessment. Experiments demonstrate that TACOS significantly outperforms state-of-the-art baselines on MT-Bench and AlpacaEval 2.0. Notably, it achieves top performance on AlpacaEval 2.0 using LLaMA2-7B, validating its robust generalization across diverse datasets and model architectures.

Technology Category

Application Category

📝 Abstract

Instruction Fine-Tuning (IFT) is crucial for aligning large language models (LLMs) with human preferences, and selecting a small yet representative subset from massive data significantly facilitates IFT in terms of both efficiency and effectiveness. Nevertheless, existing approaches suffer from two limitations: the use of simple heuristics restricts data diversity, while the singleton data quality evaluation accounts for inconsistent criteria between independent samples. To address the issues, we present TACOS, an innovative method that integrates Open Tagging and Comparative Scoring for IFT data selection. To capture data diversity, we leverage LLMs to assign open-domain tags to human queries, followed by a normalization stage to denoise the open tags and enable efficient clustering. Additionally, we suggest a comparative scoring method that allows the relative quality evaluation of samples within a cluster, avoiding inconsistent criteria seen in singleton-based evaluations. Extensive experiments across diverse datasets and LLM architectures demonstrate that TACOS outperforms existing approaches by a large margin. Notably, it achieves superior instruction-following performance on MT-Bench and ranks 1st among LLaMA2-7B-Based models on AlpacaEval 2.0, illustrating its efficacy for IFT data selection.

Problem

Research questions and friction points this paper is trying to address.

Enhancing data diversity in instruction fine-tuning selection

Addressing inconsistent quality criteria in singleton evaluations

Improving efficiency and effectiveness of IFT data selection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Open-domain tagging for diverse data clustering

Comparative scoring for consistent quality evaluation

Normalization stage to denoise and refine tags

🔎 Similar Papers

No similar papers found.

Authors to Follow