AdaPaD: Adaptive Parallel Deflation for PEFT with Self-Correcting Rank Discovery

📅 2026-05-11

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

This work addresses the limitation of conventional LoRA fine-tuning, which requires a fixed rank to be pre-specified and thus cannot dynamically optimize rank allocation. The authors propose AdaPaD, a novel method that trains all rank-1 components in parallel and integrates a self-correcting decorrelation mechanism with module-level dynamic rank discovery, thereby treating rank distribution as a learned outcome rather than a preset hyperparameter. AdaPaD is the first approach to unify dynamic rank allocation with a shared parameter budget, introducing an importance-based rank growth strategy and a lookahead learning mechanism that collectively drive approximation error to zero as training converges. On the GLUE benchmark, AdaPaD matches the performance of adaptive-rank LoRA, while on SQuAD with Qwen3-0.6B it achieves parity with fixed-rank LoRA using 30.7% fewer adapter parameters on average.

📝 Abstract

Fine-tuning large language models with LoRA requires choosing a rank r before training starts. Existing approaches either extract rank-1 components sequentially, freezing each component's error permanently into every subsequent residual, or optimize the full low-rank factorization jointly with guarantees that describe only the joint update, not individual rank-1 directions. We present AdaPaD (Adaptive Parallel Deflation), which trains all rank-1 components simultaneously: each worker refines its component against a deflation target built from the latest estimates of all predecessors, and as those estimates improve, the targets improve too. We call this property self-correction: deflation errors converge to zero over rounds rather than persisting as fixed residuals. On top of this backbone, AdaPaD adds advance learning (private pre-training before activation) and per-module dynamic rank discovery (importance-based growth until a shared budget is exhausted), making the rank distribution an output rather than an input. We prove that every component's error decays exponentially after a warm-up period, with a generalization bound that splits into a vanishing algorithmic term and an irreducible statistical floor. Empirically, AdaPaD is competitive with adaptive-rank LoRA baselines on GLUE with DeBERTaV3-base at matched parameter budgets, and competitive with fixed-rank LoRA on Qwen3-0.6B SQuAD/SQuAD v2 while deploying an adapter that is on average 30.7% smaller.

Problem

Research questions and friction points this paper is trying to address.

LoRA

rank selection

parameter-efficient fine-tuning

adaptive rank

low-rank adaptation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive Parallel Deflation

Self-Correcting Rank Discovery

Parameter-Efficient Fine-Tuning