Queryable LoRA: Instruction-Regularized Routing Over Shared Low-Rank Update Atoms

📅 2026-05-08

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

This work addresses the limitations of conventional Low-Rank Adaptation (LoRA), whose static parameterization struggles to dynamically respond to input variations and network depth, thereby constraining model expressiveness. To overcome this, the authors propose Queryable Low-Rank Adaptation (QLoRA), a dynamic mechanism that leverages a shared low-rank atomic library and generates queries informed by both current states and historical context. These queries are processed through an attention-based routing module to produce context-aware low-rank updates. Additionally, the method introduces instruction regularization grounded in linguistic priors—a novel strategy that steers parameter adaptation toward semantically aligned representations. By transforming LoRA into an input-adaptive, cross-layer shared framework, QLoRA achieves significantly improved test performance and training stability over standard LoRA, with comparable parameter counts, across nonlinear regression and large language model fine-tuning tasks.

📝 Abstract

We present a data-adaptive method for parameter-efficient fine-tuning of large neural networks. Standard low-rank adaptation methods improve efficiency by restricting each layer update to a fixed low-rank form, but this static parameterization can be too rigid when the appropriate correction depends on the input and on the evolving depth-wise computation of the network. Our approach replaces a purely layer-local adapter with a shared queryable memory of low-rank update atoms. For each block of layers, the model forms a query from the current low-rank state and a running summary of previous blocks, uses this query to retrieve a content-dependent combination of shared update components via attention, and applies the resulting routed operator within the low-rank bottleneck. In this way, the method retains the efficiency and scalability of low-rank adaptation while allowing the effective update to vary across inputs and to share reusable structure across layers. The resulting architecture provides a principled middle ground between static LoRA-style updates and fully generated parameter updates: it remains compact and parameter-efficient while supporting dynamic, context-sensitive adaptation. Further, we incorporate instruction-regularization by augmenting routing logits with a language-induced prior over update atoms, thereby biasing the selection of low-rank transformations toward semantically relevant directions without generating unconstrained parameter updates. Experiments on noisy non-linear regression tasks and LLM fine-tuning suggest that this queryable update-memory formulation can improve final test performance and training stability compared to standard low-rank adaptation, while using a comparable number of trainable parameters.

Problem

Research questions and friction points this paper is trying to address.

low-rank adaptation

parameter-efficient fine-tuning

dynamic update

context-sensitive adaptation

static parameterization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Queryable LoRA

low-rank adaptation

dynamic routing