Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs

📅 2024-10-10

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

221K/year

🤖 AI Summary

To address data redundancy and information inefficiency caused by nearest-neighbor retrieval in test-time fine-tuning (TTFT), this paper proposes SIFT—the first data selection algorithm that deeply integrates active learning with retrieval. SIFT adaptively selects highly discriminative samples by jointly modeling response uncertainty and information gain, substantially reducing redundancy and improving fine-tuning efficiency. Its key contributions are: (1) the first active-learning-driven retrieval mechanism tailored for TTFT; (2) a differentiable uncertainty estimator enabling predictive performance gain estimation; and (3) exceptional computational efficiency—near-zero overhead—without compromising generalization. Evaluated on The Pile, SIFT significantly outperforms standard nearest-neighbor retrieval across diverse downstream tasks. To facilitate adoption, we open-source the *activeft* library, providing plug-and-play implementation of SIFT for TTFT.

Technology Category

Application Category

📝 Abstract

Recent efforts in fine-tuning language models often rely on automatic data selection, commonly using Nearest Neighbors retrieval from large datasets. However, we theoretically show that this approach tends to select redundant data, limiting its effectiveness or even hurting performance. To address this, we introduce SIFT, a data selection algorithm designed to reduce uncertainty about the model's response given a prompt, which unifies ideas from retrieval and active learning. Whereas Nearest Neighbor retrieval typically fails in the presence of information duplication, SIFT accounts for information duplication and optimizes the overall information gain of the selected examples. We focus our evaluations on fine-tuning at test-time for prompt-specific language modeling on the Pile dataset, and show that SIFT consistently outperforms Nearest Neighbor retrieval, with minimal computational overhead. Moreover, we show that our uncertainty estimates can predict the performance gain of test-time fine-tuning, and use this to develop an adaptive algorithm that invests test-time compute proportional to realized performance gains. We provide the $ exttt{activeft}$ (Active Fine-Tuning) library which can be used as a drop-in replacement for Nearest Neighbor retrieval.

Problem

Research questions and friction points this paper is trying to address.

Improves data selection for fine-tuning LLMs.

Reduces redundancy in training data selection.

Enhances test-time fine-tuning performance efficiently.

Innovation

Methods, ideas, or system contributions that make the work stand out.

SIFT algorithm reduces data redundancy

Active Fine-Tuning optimizes information gain

Uncertainty estimates predict performance gains

🔎 Similar Papers

A Semantic-Aware Layer-Freezing Approach to Computation-Efficient Fine-Tuning of Language Models