🤖 AI Summary
To address the challenge of efficient large-model transfer in few-shot and test-time adaptation scenarios, this paper proposes BOLT—a training-free framework that leverages only the inherent architecture of multi-task pre-trained models, without meta-training or additional training overhead. Its core innovation is the first introduction of task-aware orthogonal singular directions to construct a reusable spectral basis: dominant directions per layer are extracted via SVD and orthogonalized into a shared basis; the basis is frozen while only diagonal scaling coefficients per layer are learned for low-rank adaptation. This yields a strong, training-free initialization and enables ultra-lightweight fine-tuning (<0.1% trainable parameters). Experiments demonstrate that BOLT significantly outperforms state-of-the-art PEFT and meta-learning initialization methods on few-shot classification and test-time adaptation tasks, achieving both high efficiency and robustness.
📝 Abstract
Adapting large pre-trained models to unseen tasks under tight data and compute budgets remains challenging. Meta-learning approaches explicitly learn good initializations, but they require an additional meta-training phase over many tasks, incur high training cost, and can be unstable. At the same time, the number of task-specific pre-trained models continues to grow, yet the question of how to transfer them to new tasks with minimal additional training remains relatively underexplored. We propose BOLT (Basis-Oriented Low-rank Transfer), a framework that reuses existing fine-tuned models not by merging weights, but instead by extracting an orthogonal, task-informed spectral basis and adapting within that subspace. In the offline phase, BOLT collects dominant singular directions from multiple task vectors and orthogonalizes them per layer to form reusable bases. In the online phase, we freeze these bases and train only a small set of diagonal coefficients per layer for the new task, yielding a rank-controlled update with very few trainable parameters. This design provides (i) a strong, training-free initialization for unseen tasks, obtained by pooling source-task coefficients, along with a lightweight rescaling step while leveraging the shared orthogonal bases, and (ii) a parameter-efficient fine-tuning (PEFT) path that, in our experiments, achieves robust performance compared to common PEFT baselines as well as a representative meta-learned initialization. Our results show that constraining adaptation to a task-informed orthogonal subspace provides an effective alternative for unseen-task transfer.