Optimization-Inspired Few-Shot Adaptation for Large Language Models

📅 2025-05-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of poor adaptability, high computational overhead, and weak generalization of large language models (LLMs) in few-shot settings, this paper proposes a parameter-free forward-pass reinterpretation method. It recasts LLM inference as preconditioned gradient descent with learnable preconditioners, explicitly linking forward propagation to preconditioner learning under convergence constraints. Crucially, implicit parameterization steers the optimization trajectory toward flat minima. Unlike context-based prompting—which relies heavily on prompt engineering—or parameter-efficient fine-tuning (PEFT)—which introduces extra parameters and inference latency—our approach incurs zero parameter overhead while preserving full model capacity. Empirically, it achieves state-of-the-art performance across diverse few-shot tasks, demonstrating superior generalization, negligible computational cost (no trainable parameters), and strong theoretical interpretability grounded in optimization theory.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have demonstrated remarkable performance in real-world applications. However, adapting LLMs to novel tasks via fine-tuning often requires substantial training data and computational resources that are impractical in few-shot scenarios. Existing approaches, such as in-context learning and Parameter-Efficient Fine-Tuning (PEFT), face key limitations: in-context learning introduces additional inference computational overhead with limited performance gains, while PEFT models are prone to overfitting on the few demonstration examples. In this work, we reinterpret the forward pass of LLMs as an optimization process, a sequence of preconditioned gradient descent steps refining internal representations. Based on this connection, we propose Optimization-Inspired Few-Shot Adaptation (OFA), integrating a parameterization that learns preconditioners without introducing additional trainable parameters, and an objective that improves optimization efficiency by learning preconditioners based on a convergence bound, while simultaneously steering the optimization path toward the flat local minimum. Our method overcomes both issues of ICL-based and PEFT-based methods, and demonstrates superior performance over the existing methods on a variety of few-shot adaptation tasks in experiments.
Problem

Research questions and friction points this paper is trying to address.

Adapting LLMs to novel tasks with limited data
Overcoming overfitting in few-shot fine-tuning methods
Reducing computational overhead in few-shot adaptation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinterprets LLM forward pass as optimization
Learns preconditioners without extra parameters
Improves optimization efficiency via convergence bound
🔎 Similar Papers
No similar papers found.