AFA-LoRA: Enabling Non-Linear Adaptations in LoRA with Activation Function Annealing

📅 2025-12-26

📈 Citations: 0

✨ Influential: 0

career value

233K/year

🤖 AI Summary

LoRA’s linear adaptation structure inherently limits its representational capacity, creating an expressivity gap compared to nonlinear fine-tuning. To address this, we propose Activation Annealing: a novel training strategy that introduces learnable piecewise nonlinear activations (e.g., Sigmoid or GeLU) during early training stages to enhance modeling capability, then progressively anneals them to linearity—yielding a strictly mergeable LoRA module at convergence. Our method jointly optimizes gradients across diverse paradigms—including supervised fine-tuning (SFT), reinforcement learning with human feedback (RLHF), and speculative decoding—while preserving LoRA’s low GPU memory footprint and deployment compatibility. Empirical results demonstrate that Activation Annealing significantly narrows the performance gap between LoRA and full-parameter fine-tuning, achieving near-parity or even matching full-parameter performance across multiple benchmarks. To the best of our knowledge, this is the first LoRA enhancement framework enabling dynamic “nonlinear training, linear inference” adaptation.

Technology Category

Application Category

📝 Abstract

Low-Rank Adaptation (LoRA) is a widely adopted parameter-efficient fine-tuning (PEFT) method. However, its linear adaptation process limits its expressive power. This means there is a gap between the expressive power of linear training and non-linear training. To bridge this gap, we propose AFA-LoRA, a novel training strategy that brings non-linear expressivity to LoRA while maintaining its seamless mergeability. Our key innovation is an annealed activation function that transitions from a non-linear to a linear transformation during training, allowing the adapter to initially adopt stronger representational capabilities before converging to a mergeable linear form. We implement our method on supervised fine-tuning, reinforcement learning, and speculative decoding. The results show that AFA-LoRA reduces the performance gap between LoRA and full-parameter training. This work enables a more powerful and practical paradigm of parameter-efficient adaptation.

Problem

Research questions and friction points this paper is trying to address.

Bridges expressive gap between linear and non-linear training in LoRA

Introduces annealed activation for non-linear adaptability while maintaining mergeability

Enhances performance across fine-tuning, reinforcement learning, and speculative decoding

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces annealed activation function for non-linear adaptation

Enables transition from non-linear to linear transformation during training

Maintains seamless mergeability while enhancing expressive power

🔎 Similar Papers

No similar papers found.