Less is Enough: Synthesizing Diverse Data in Feature Space of LLMs

📅 2026-02-11

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

Existing post-training data construction methods rely on text-level diversity metrics, which often fail to capture the semantic features critical to downstream task performance. This work proposes a feature-driven approach to measure and synthesize diverse data within an interpretable feature space of large language models. By leveraging sparse autoencoders, we extract interpretable features shared across multiple models (LLaMA, Mistral, and Qwen) and introduce Feature Activation Coverage (FAC) as a novel diversity metric. Using FAC, we identify underrepresented features in seed data and generate targeted supplementary samples to fill these gaps. Empirical results demonstrate that this method significantly enhances both data diversity and model performance across a range of tasks, including instruction following, toxicity detection, reward modeling, and behavior steering.

Technology Category

Application Category

📝 Abstract

The diversity of post-training data is critical for effective downstream performance in large language models (LLMs). Many existing approaches to constructing post-training data quantify diversity using text-based metrics that capture linguistic variation, but such metrics provide only weak signals for the task-relevant features that determine downstream performance. In this work, we introduce Feature Activation Coverage (FAC) which measures data diversity in an interpretable feature space. Building upon this metric, we further propose a diversity-driven data synthesis framework, named FAC Synthesis, that first uses a sparse autoencoder to identify missing features from a seed dataset, and then generates synthetic samples that explicitly reflect these features. Experiments show that our approach consistently improves both data diversity and downstream performance on various tasks, including instruction following, toxicity detection, reward modeling, and behavior steering. Interestingly, we identify a shared, interpretable feature space across model families (i.e., LLaMA, Mistral, and Qwen), enabling cross-model knowledge transfer. Our work provides a solid and practical methodology for exploring data-centric optimization of LLMs.

Problem

Research questions and friction points this paper is trying to address.

data diversity

large language models

post-training data

feature space

downstream performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Feature Activation Coverage

Data Synthesis

Sparse Autoencoder

Feature Space

Cross-Model Transfer

🔎 Similar Papers

No similar papers found.