AFA-LoRA: Enabling Non-Linear Adaptations in LoRA with Activation Function Annealing

📅 2025-12-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
LoRA’s linear adaptation structure inherently limits its representational capacity, creating an expressivity gap compared to nonlinear fine-tuning. To address this, we propose Activation Annealing: a novel training strategy that introduces learnable piecewise nonlinear activations (e.g., Sigmoid or GeLU) during early training stages to enhance modeling capability, then progressively anneals them to linearity—yielding a strictly mergeable LoRA module at convergence. Our method jointly optimizes gradients across diverse paradigms—including supervised fine-tuning (SFT), reinforcement learning with human feedback (RLHF), and speculative decoding—while preserving LoRA’s low GPU memory footprint and deployment compatibility. Empirical results demonstrate that Activation Annealing significantly narrows the performance gap between LoRA and full-parameter fine-tuning, achieving near-parity or even matching full-parameter performance across multiple benchmarks. To the best of our knowledge, this is the first LoRA enhancement framework enabling dynamic “nonlinear training, linear inference” adaptation.

Technology Category

Application Category

📝 Abstract
Low-Rank Adaptation (LoRA) is a widely adopted parameter-efficient fine-tuning (PEFT) method. However, its linear adaptation process limits its expressive power. This means there is a gap between the expressive power of linear training and non-linear training. To bridge this gap, we propose AFA-LoRA, a novel training strategy that brings non-linear expressivity to LoRA while maintaining its seamless mergeability. Our key innovation is an annealed activation function that transitions from a non-linear to a linear transformation during training, allowing the adapter to initially adopt stronger representational capabilities before converging to a mergeable linear form. We implement our method on supervised fine-tuning, reinforcement learning, and speculative decoding. The results show that AFA-LoRA reduces the performance gap between LoRA and full-parameter training. This work enables a more powerful and practical paradigm of parameter-efficient adaptation.
Problem

Research questions and friction points this paper is trying to address.

Bridges expressive gap between linear and non-linear training in LoRA
Introduces annealed activation for non-linear adaptability while maintaining mergeability
Enhances performance across fine-tuning, reinforcement learning, and speculative decoding
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces annealed activation function for non-linear adaptation
Enables transition from non-linear to linear transformation during training
Maintains seamless mergeability while enhancing expressive power
🔎 Similar Papers
No similar papers found.
J
Jiacheng Li
Meituan, Beijing, China
Jianchao Tan
Jianchao Tan
Meituan
LLMAutomated Machine LearningComputer GraphicsComputer Vision
Z
Zhidong Yang
Hong Kong University of Science and Technology, Hong Kong SAR, China
F
Feiye Huo
Meituan, Beijing, China
Y
Yerui Sun
Meituan, Beijing, China
Y
Yuchen Xie
Meituan, Beijing, China
X
Xunliang Cai
Meituan, Beijing, China