Compress to Impress: Efficient LLM Adaptation Using a Single Gradient Step on 100 Samples

๐Ÿ“… 2025-10-23
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing LASER-based zero-shot fine-tuning adaptation for large language models (LLMs) suffers from prohibitively expensive layer-wise search and high forward-pass overhead over full datasets. Method: This paper proposes an efficient, fine-tuning-free adaptation framework that requires only 100 samples and a single gradient step. It identifies critical weight layers via singular value gradient analysis, then applies layer-selective low-rank compression coupled with multi-subspace clustering decomposition. Contribution/Results: The method significantly mitigates overfitting while achieving parameter-efficient model compression. Compared to baselines relying on full-dataset evaluation and exhaustive layer search, it drastically reduces computational cost and improves accuracy by up to 24.6 percentage points across multiple downstream tasksโ€”marking the first approach enabling robust zero-shot adaptation with few-shot data, single-step gradient guidance, and cross-layer coordination.

Technology Category

Application Category

๐Ÿ“ Abstract
Recently, Sharma et al. suggested a method called Layer-SElective-Rank reduction (LASER) which demonstrated that pruning high-order components of carefully chosen LLM's weight matrices can boost downstream accuracy -- without any gradient-based fine-tuning. Yet LASER's exhaustive, per-matrix search (each requiring full-dataset forward passes) makes it impractical for rapid deployment. We demonstrate that this overhead can be removed and find that: (i) Only a small, carefully chosen subset of matrices needs to be inspected -- eliminating the layer-by-layer sweep, (ii) The gradient of each matrix's singular values pinpoints which matrices merit reduction, (iii) Increasing the factorization search space by allowing matrices rows to cluster around multiple subspaces and then decomposing each cluster separately further reduces overfitting on the original training data and further lifts accuracy by up to 24.6 percentage points, and finally, (iv) we discover that evaluating on just 100 samples rather than the full training data -- both for computing the indicative gradients and for measuring the final accuracy -- suffices to further reduce the search time; we explain that as adaptation to downstream tasks is dominated by prompting style, not dataset size. As a result, we show that combining these findings yields a fast and robust adaptation algorithm for downstream tasks. Overall, with a single gradient step on 100 examples and a quick scan of the top candidate layers and factorization techniques, we can adapt LLMs to new datasets -- entirely without fine-tuning.
Problem

Research questions and friction points this paper is trying to address.

Pruning LLM weights without exhaustive layer-by-layer search
Identifying critical matrices using singular value gradients
Adapting LLMs with minimal data and no fine-tuning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses singular value gradients to select layers
Expands factorization with multi-subspace clustering
Adapts models with one gradient step on 100 samples
๐Ÿ”Ž Similar Papers
No similar papers found.