Diversity-driven Data Selection for Language Model Tuning through Sparse Autoencoder

📅 2025-02-19

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

Instruction tuning suffers from data saturation, redundancy, and insufficient diversity; existing quality-driven approaches neglect the synergistic interplay between diversity and complexity. This paper proposes a diversity-driven data selection method: it introduces sparse autoencoders—novel in instruction dataset evaluation—to construct an embedding-space diversity metric, and jointly models response length and semantic complexity to balance distribution coverage and task difficulty. The method significantly enhances model behavioral interpretability and controllability, outperforms baselines across multiple capability benchmarks, and reduces training cost. Its core innovation lies in establishing a quantifiable, interpretable diversity assessment paradigm, and uncovering the mechanistic basis underlying the empirical preference for longer responses—revealing it as an emergent consequence of semantic richness and structural elaboration rather than mere token count.

Technology Category

Application Category

📝 Abstract

Current pre-trained large language models typically need instruction tuning to align with human preferences. However, instruction tuning data is often quantity-saturated due to the large volume of data collection and fast model iteration, leaving coreset data selection important but underexplored. On the other hand, existing quality-driven data selection methods such as LIMA (NeurIPS 2023 (Zhou et al., 2024)) and AlpaGasus (ICLR 2024 (Chen et al.)) generally ignore the equal importance of data diversity and complexity. In this work, we aim to design a diversity-aware data selection strategy and creatively propose using sparse autoencoders to tackle the challenge of data diversity measure. In addition, sparse autoencoders can also provide more interpretability of model behavior and explain, e.g., the surprising effectiveness of selecting the longest response (ICML 2024 (Zhao et al.)). Using effective data selection, we experimentally prove that models trained on our selected data can outperform other methods in terms of model capabilities, reduce training cost, and potentially gain more control over model behaviors.

Problem

Research questions and friction points this paper is trying to address.

Enhance language model alignment with human preferences.

Address data diversity in instruction tuning.

Reduce training costs through effective data selection.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparse autoencoder for diversity measure

Diversity-aware data selection strategy

Enhanced model interpretability and control

🔎 Similar Papers

No similar papers found.