DiffLoRA: Differential Low-Rank Adapters for Large Language Models

πŸ“… 2025-07-31
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the coarse-grained modeling and noise sensitivity of attention mechanisms in parameter-efficient fine-tuning (PEFT), this paper proposes DiffLoRAβ€”a novel method that tightly integrates differential attention with low-rank adaptation. Specifically, it introduces learnable low-rank adapters separately onto the positive and negative terms of Transformer self-attention logits, explicitly parameterizing the differential structure of attention weights to enable noise-cancellation-style attention optimization. This design constitutes the first differentiable, denoising-capable attention parameterization within the PEFT framework, achieving LoRA-level parameter efficiency (<0.5% additional parameters) while substantially improving generalization. Experiments demonstrate that DiffLoRA matches state-of-the-art PEFT methods on mainstream NLP benchmarks (e.g., GLUE, SQuAD) and outperforms standard LoRA by 11.0 percentage points on the HumanEval code-generation benchmark, validating both the efficacy and practicality of differential attention modeling.

Technology Category

Application Category

πŸ“ Abstract
Differential Transformer has recently been proposed to improve performance in Transformer models by canceling out noise through a denoiser attention mechanism. In this work, we introduce DiffLoRA, a parameter-efficient adaptation of the differential attention mechanism, with low-rank adapters on both positive and negative attention terms. This approach retains the efficiency of LoRA while aiming to benefit from the performance gains of differential attention. We evaluate DiffLoRA across a broad range of NLP tasks, including general benchmarks, many-shot in-context learning, RAG, and long-context tests. We observe that, although DiffLoRA falls short of other parameter-efficient fine-tuning methods in most evaluation tasks, it shows interesting results in certain domains (+11 pts on LoRA for HumanEval). We analyze the attention patterns post-finetuning to identify the reasons for this behavior.
Problem

Research questions and friction points this paper is trying to address.

Improving Transformer performance with differential attention
Parameter-efficient adaptation using low-rank adapters
Evaluating performance across diverse NLP tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Differential attention with low-rank adapters
Combines LoRA efficiency and differential performance
Evaluated across diverse NLP tasks
πŸ”Ž Similar Papers
No similar papers found.