One Algorithm, Two Goals: Dual Scoring for Parameter and Data Selection in LLM Fine-Tuning

📅 2026-05-07

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

This work addresses the challenge in large language model fine-tuning where parameter and data selection, often guided by independent scoring mechanisms, fail to coordinate effectively, leading to computational redundancy. The authors unify these two processes into a bilevel optimization framework sharing a common validation objective and propose DualSFT, a novel method that constructs a gradient interaction matrix to establish a row-column correspondence between parameter importance and data utility. This enables, for the first time, joint closed-form scoring and co-extraction of parameter masks and data subsets. By integrating first- and second-order validation approximations with a single-pass dual-scoring strategy, DualSFT significantly outperforms sequential baselines across 3B–9B scale models, simultaneously enhancing task performance and the trade-off between stability and plasticity under fixed computational budgets.

📝 Abstract

In Large Language Model (LLM) fine-tuning, parameter and data selection are common strategies for reducing fine-tuning cost, yet they are typically driven by separate scoring mechanisms. When a parameter mask and data subset jointly determine restricted fine-tuning, this separation incurs redundant overhead and makes coordinated selection difficult. We cast parameter and data selection as two bilevel selection problems under a common validation objective and derive a shared local response-surrogate scoring rule. Under first- and second-order validation-improvement approximations, parameter importance and data utility emerge as column-wise and row-wise aggregations of a single gradient interaction matrix, yielding a closed-form row-column correspondence for co-extracting both signals. Building on this structure, we propose DualSFT (Dual-Selection Fine-Tuning), a one-shot dual-scoring algorithm that produces a parameter mask and data subset from shared gradient statistics. On 3B-9B LLMs, single-axis DualSFT variants strengthen target-task performance and stability-plasticity trade-offs within their comparison groups, while full DualSFT yields a more favorable joint-constrained trade-off than sequential hybrid baselines under matched budgets.

Problem

Research questions and friction points this paper is trying to address.

parameter selection

data selection

LLM fine-tuning

coordinated selection

bilevel optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

dual scoring

parameter selection

data selection