Beyond Higher Rank: Token-wise Input-Output Projections for Efficient Low-Rank Adaptation

📅 2025-10-27

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

Standard LoRA employs fixed, shared low-rank adaptation weights across all input tokens, limiting its capacity to capture token-level semantic distinctions. To address this, we propose TopLoRA—the first method to introduce token-wise dynamic weighting: it generates a diagonal matrix Σ_X from each input token, extending the adaptation from BA to BΣ_XA. This design enables fine-grained, semantics-aware input-output projection without increasing the rank or parameter count. By breaking the conventional weight-sharing constraint, TopLoRA significantly enhances the representational capacity of parameter-efficient fine-tuning. Extensive experiments across large language models (e.g., LLaMA, Qwen) and diverse downstream tasks (e.g., GLUE, MT-Bench) demonstrate that TopLoRA consistently outperforms LoRA and its prominent variants, validating both its effectiveness and strong generalization capability.

Technology Category

Application Category

📝 Abstract

Low-rank adaptation (LoRA) is a parameter-efficient fine-tuning (PEFT) method widely used in large language models (LLMs). LoRA essentially describes the projection of an input space into a low-dimensional output space, with the dimensionality determined by the LoRA rank. In standard LoRA, all input tokens share the same weights and undergo an identical input-output projection. This limits LoRA's ability to capture token-specific information due to the inherent semantic differences among tokens. To address this limitation, we propose Token-wise Projected Low-Rank Adaptation (TopLoRA), which dynamically adjusts LoRA weights according to the input token, thereby learning token-wise input-output projections in an end-to-end manner. Formally, the weights of TopLoRA can be expressed as $BΣ_X A$, where $A$ and $B$ are low-rank matrices (as in standard LoRA), and $Σ_X$ is a diagonal matrix generated from each input token $X$. Notably, TopLoRA does not increase the rank of LoRA weights but achieves more granular adaptation by learning token-wise LoRA weights (i.e., token-wise input-output projections). Extensive experiments across multiple models and datasets demonstrate that TopLoRA consistently outperforms LoRA and its variants. The code is available at https://github.com/Leopold1423/toplora-neurips25.

Problem

Research questions and friction points this paper is trying to address.

Standard LoRA shares weights across all input tokens

This limits capture of token-specific semantic information

TopLoRA dynamically adjusts weights per token for granular adaptation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Token-wise LoRA weights for granular adaptation

Dynamic weight adjustment based on input tokens

Diagonal matrix enables token-specific projections

🔎 Similar Papers

No similar papers found.