Utility-Diversity Aware Online Batch Selection for LLM Supervised Fine-tuning

📅 2025-10-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing online batch data selection methods for LLM supervised fine-tuning (SFT) suffer from three key limitations: overreliance on utility metrics while neglecting diversity, dependence on external reference models or validation sets, and introduction of additional training overhead. This paper proposes the first external-dependency-free, efficient online batch filtering framework that jointly models data utility and intra- and inter-sample diversity. Specifically, it unifies utility and intra-sample diversity quantification via the nuclear norm of the logits matrix; estimates inter-sample diversity by comparing low-dimensional embeddings of historical samples—obviated backward propagation; and incorporates a lightweight memory buffer for dynamic optimization. Extensive experiments across multiple benchmarks demonstrate that our method significantly reduces training time while consistently outperforming state-of-the-art approaches under varying data budgets.

Technology Category

Application Category

📝 Abstract
Supervised fine-tuning (SFT) is a commonly used technique to adapt large language models (LLMs) to downstream tasks. In practice, SFT on a full dataset is computationally expensive and sometimes suffers from overfitting or bias amplification. This facilitates the rise of data curation in SFT, which prioritizes the most valuable data to optimze. This work studies the online batch selection family that dynamically scores and filters samples during the training process. However, existing popular methods often (i) rely merely on the utility of data to select a subset while neglecting other crucial factors like diversity, (ii) rely on external resources such as reference models or validation sets, and (iii) incur extra training time over full-dataset training. To address these limitations, this work develops extbf{UDS (Utility-Diversity Sampling)}, a framework for efficient online batch selection in SFT. UDS leverages the nuclear norm of the logits matrix to capture both data utility and intra-sample diversity, while estimating inter-sample diversity through efficient low-dimensional embedding comparisons with a lightweight memory buffer of historical samples. Such a design eliminates the need for external resources and unnecessary backpropagation, securing computational efficiency. Experiments on multiple benchmarks demonstrate that UDS consistently outperforms state-of-the-art online batch selection methods under varying data budgets, and significantly reduces training time compared to full-dataset fine-tuning. Code is available at https://github.com/gfyddha/UDS.
Problem

Research questions and friction points this paper is trying to address.

Optimizes data selection for LLM fine-tuning by balancing utility and diversity
Eliminates reliance on external resources like reference models or validation sets
Reduces computational costs and training time compared to full dataset training
Innovation

Methods, ideas, or system contributions that make the work stand out.

UDS uses nuclear norm for utility and diversity
It compares embeddings with historical samples buffer
Eliminates external resources and unnecessary backpropagation
Heming Zou
Heming Zou
Tsinghua University
Machine Learning
Y
Yixiu Mao
Department of Automation, Tsinghua University
Y
Yun Qu
Department of Automation, Tsinghua University
Q
Qi Wang
Department of Automation, Tsinghua University
X
Xiangyang Ji
Department of Automation, Tsinghua University