Filter-then-Weight: Online Data Selection and Reweighting for LLM Fine-Tuning

📅 2026-03-08

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

Existing gradient-based data selection methods struggle to handle key challenges in online fine-tuning, including streaming data arrival, dynamically shifting sample utility, and the impact of adaptive optimizers on update geometry. This work proposes an optimizer-aware online data selection and reweighting framework that formulates sample selection as constructing a goal-oriented next-step update conditioned on the current optimizer state. The approach innovatively establishes a connection between data selection and second-order target utility, explicitly modeling inter-sample interactions and redundancy, and efficiently handles long-context data through a factorized outer-product gradient representation. The proposed Filter-then-Weight two-stage algorithm—first geometrically filtering informative samples and then optimizing their weights—significantly outperforms existing baselines under the same data budget, consistently accelerating convergence and improving downstream performance in large language model fine-tuning.

📝 Abstract

Gradient-based data selection offers a principled framework for estimating sample utility in large language model (LLM) fine-tuning, but existing methods are mostly designed for offline settings. They are therefore less suited to online fine-tuning, where data arrives sequentially, sample utility is step-dependent, and the effective update geometry is shaped by adaptive optimizers. We propose an optimizer-aware framework for gradient-based online data selection and reweighting in LLM fine-tuning. Our key idea is to view online selection not as static sample ranking, but as shaping the next target-oriented update under the current optimizer state. We formulate this as an optimizer-aware update-matching problem, establish its connection to second-order target utility, and show why subset-level construction must account for interactions and redundancy among selected samples. Based on this view, we develop a two-stage Filter-then-Weight algorithm that first filters geometrically useful candidates and then optimizes their coefficients. To make the framework practical for LLMs, we introduce a factorized outer-product gradient representation and optimized matrix computations for long-context data. Experiments show that our method consistently improves convergence and downstream performance over existing online data selection baselines under the same data budget.

Problem

Research questions and friction points this paper is trying to address.

online fine-tuning

data selection

large language models

gradient-based selection

adaptive optimizers

Innovation

Methods, ideas, or system contributions that make the work stand out.

optimizer-aware

online data selection

gradient-based reweighting