Optimal Budgeted Adaptation of Large Language Models

šŸ“… 2026-02-01
šŸ“ˆ Citations: 0
✨ Influential: 0
šŸ“„ PDF
šŸ¤– AI Summary
This work addresses the trade-off between limited labeled data and downstream performance in large language model fine-tuning by formalizing budget-aware supervised fine-tuning as a contextual Stackelberg game. The learner, acting as the leader, jointly optimizes scoring and label query strategies, while the environment, as the follower, responds by generating the most challenging samples. A total supervision budget constraint is explicitly embedded into the objective function. Under linear contextual assumptions, an efficient label selection mechanism is achieved via a Largest-Latency-First (LLF) confidence gating strategy. Theoretical analysis shows that under full feedback, the algorithm achieves an ƕ(d√T) regret bound; with LLF integration, it further attains a budget-aware regret bound of ƕ(√(dB) + c√B), where B = βT, substantially improving label efficiency.

Technology Category

Application Category

šŸ“ Abstract
The trade-off between labeled data availability and downstream accuracy remains a central challenge in fine-tuning large language models (LLMs). We propose a principled framework for \emph{budget-aware supervised fine-tuning} by casting LLM adaptation as a contextual Stackelberg game. In our formulation, the learner (leader) commits to a scoring policy and a label-querying strategy, while an adaptive environment (follower) selects challenging supervised alternatives in response. To explicitly address label efficiency, we incorporate a finite supervision budget directly into the learning objective. Our algorithm operates in the full-feedback regime and achieves $\tilde{O}(d\sqrt{T})$ regret under standard linear contextual assumptions. We extend the framework with a Largest-Latency-First (LLF) confidence gate that selectively queries labels, achieving a budget-aware regret bound of $\tilde{O}(\sqrt{dB} + c\sqrt{B})$ with $B=\beta T$.
Problem

Research questions and friction points this paper is trying to address.

budgeted adaptation
large language models
supervised fine-tuning
label efficiency
supervision budget
Innovation

Methods, ideas, or system contributions that make the work stand out.

budget-aware fine-tuning
Stackelberg game
label efficiency
regret bound
confidence gating
šŸ”Ž Similar Papers