SALAAD: Sparse And Low-Rank Adaptation via ADMM for Large Language Model Inference

📅 2026-02-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of flexible capacity control in large language models under computational and memory constraints, where existing approaches often rely on heuristic designs or architecture-specific modifications. The authors propose SALAAD, a framework that jointly imposes sparse and low-rank structures via the augmented Lagrangian method and introduces an adaptive controller to dynamically balance task loss against structural constraints. This enables explicit, continuous regulation of model capacity without requiring retraining, yielding a spectrum of models spanning diverse memory budgets from a single training run. SALAAD is architecture-agnostic and demonstrates strong empirical performance: it substantially reduces deployment memory overhead while matching the accuracy of bespoke solutions, all while maintaining training stability and offering fine-grained capacity adjustability.

Technology Category

Application Category

📝 Abstract
Modern large language models are increasingly deployed under compute and memory constraints, making flexible control of model capacity a central challenge. While sparse and low-rank structures naturally trade off capacity and performance, existing approaches often rely on heuristic designs that ignore layer and matrix heterogeneity or require model-specific architectural modifications. We propose SALAAD, a plug-and-play framework applicable to different model architectures that induces sparse and low-rank structures during training. By formulating structured weight learning under an augmented Lagrangian framework and introducing an adaptive controller that dynamically balances the training loss and structural constraints, SALAAD preserves the stability of standard training dynamics while enabling explicit control over the evolution of effective model capacity during training. Experiments across model scales show that SALAAD substantially reduces memory consumption during deployment while achieving performance comparable to ad-hoc methods. Moreover, a single training run yields a continuous spectrum of model capacities, enabling smooth and elastic deployment across diverse memory budgets without the need for retraining.
Problem

Research questions and friction points this paper is trying to address.

model capacity control
sparse and low-rank adaptation
memory-constrained deployment
large language models
structured weight learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparse and Low-Rank Adaptation
ADMM
Augmented Lagrangian
Model Capacity Control
Plug-and-Play Framework
🔎 Similar Papers
No similar papers found.