FisherSFT: Data-Efficient Supervised Fine-Tuning of Language Models Using Information Gain

📅 2025-05-20

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

This study addresses the low sample efficiency of supervised fine-tuning (SFT) for large language models (LLMs), aiming to improve generalization to new domains under limited annotation budgets. We propose an information-gain-based data selection framework: (1) the first integration of Fisher information approximation with information-gain maximization for SFT; and (2) a tractable, low-overhead Hessian estimation method via last-layer linearization, coupled with multinomial logistic regression to enable efficient active sampling. Experiments across multiple tasks show that using only 30%–50% of the training data achieves comparable or superior performance to full-data SFT, as validated by both LLM-based automatic evaluation and human assessment—significantly outperforming random sampling and state-of-the-art baselines. Our core contribution is a theoretically grounded, computationally efficient, information-driven sample selection paradigm specifically designed for SFT.

Technology Category

Application Category

📝 Abstract

Supervised fine-tuning (SFT) is a standard approach to adapting large language models (LLMs) to new domains. In this work, we improve the statistical efficiency of SFT by selecting an informative subset of training examples. Specifically, for a fixed budget of training examples, which determines the computational cost of fine-tuning, we determine the most informative ones. The key idea in our method is to select examples that maximize information gain, measured by the Hessian of the log-likelihood of the LLM. We approximate it efficiently by linearizing the LLM at the last layer using multinomial logistic regression models. Our approach is computationally efficient, analyzable, and performs well empirically. We demonstrate this on several problems, and back our claims with both quantitative results and an LLM evaluation.

Problem

Research questions and friction points this paper is trying to address.

Improving SFT efficiency by selecting informative training examples

Maximizing information gain via Hessian of LLM log-likelihood

Efficient approximation using linearized multinomial logistic regression

Innovation

Methods, ideas, or system contributions that make the work stand out.

Selects informative training examples using information gain

Uses Hessian of log-likelihood for example selection

Approximates efficiently via multinomial logistic regression

🔎 Similar Papers

Balancing Speciality and Versatility: A Coarse to Fine Framework for Mitigating Catastrophic Forgetting in Large Language Models