HyperFlow: Gradient-Free Emulation of Few-Shot Fine-Tuning

📅 2025-04-21

📈 Citations: 0

✨ Influential: 0

career value

222K/year

🤖 AI Summary

To address the high computational and memory overhead of test-time fine-tuning in few-shot learning—caused by repeated backpropagation—this paper proposes a gradient-free, lightweight test-time adaptation method. Our approach models gradient descent as an ordinary differential equation (ODE) and employs a task-conditioned auxiliary neural network to simulate Euler numerical integration via a single forward pass, thereby eliminating all forward and backward computations on the target model. The auxiliary network is meta-trained solely on support sets. Evaluated on Meta-Dataset and CDFSL cross-domain few-shot classification benchmarks, our method significantly improves out-of-distribution generalization. It reduces memory consumption to just 6% and inference latency to only 0.02% of standard fine-tuning, while achieving performance comparable to full-parameter fine-tuning.

Technology Category

Application Category

📝 Abstract

While test-time fine-tuning is beneficial in few-shot learning, the need for multiple backpropagation steps can be prohibitively expensive in real-time or low-resource scenarios. To address this limitation, we propose an approach that emulates gradient descent without computing gradients, enabling efficient test-time adaptation. Specifically, we formulate gradient descent as an Euler discretization of an ordinary differential equation (ODE) and train an auxiliary network to predict the task-conditional drift using only the few-shot support set. The adaptation then reduces to a simple numerical integration (e.g., via the Euler method), which requires only a few forward passes of the auxiliary network -- no gradients or forward passes of the target model are needed. In experiments on cross-domain few-shot classification using the Meta-Dataset and CDFSL benchmarks, our method significantly improves out-of-domain performance over the non-fine-tuned baseline while incurring only 6% of the memory cost and 0.02% of the computation time of standard fine-tuning, thus establishing a practical middle ground between direct transfer and fully fine-tuned approaches.

Problem

Research questions and friction points this paper is trying to address.

Emulate gradient descent without computing gradients for efficiency

Enable few-shot test-time adaptation with low computational cost

Improve out-of-domain performance with minimal memory and time overhead

Innovation

Methods, ideas, or system contributions that make the work stand out.

Emulates gradient descent without computing gradients

Uses auxiliary network for task-conditional drift prediction

Reduces adaptation to simple numerical integration

🔎 Similar Papers

No similar papers found.