🤖 AI Summary
Large language models (LLMs) face substantial computational overhead in full fine-tuning, while existing parameter-efficient fine-tuning (PEFT) methods often underperform. To address this, we propose DiagBlock—a minimalist PEFT approach that updates only the diagonal blocks of weight matrices. We provide the first theoretical proof and empirical validation that such structured diagonal-block sparsity suffices to match full fine-tuning performance, without requiring low-rank decomposition, auxiliary initialization, or additional modules. Leveraging structured parameter freezing and standard AdamW optimization, DiagBlock relies solely on conventional forward/backward propagation, markedly improving convergence stability and generalization consistency. Across diverse tasks—including commonsense reasoning, arithmetic, code generation, and safety alignment—DiagBlock matches full fine-tuning in accuracy. Its memory footprint and training speed are comparable to LoRA, yet it exhibits superior convergence robustness.
📝 Abstract
Finetuning is a critical step for adapting large language models (LLMs) to domain-specific downstream tasks. To mitigate the substantial computational and memory costs of full-model fine-tuning, Parameter-Efficient Finetuning (PEFT) methods have been proposed to update only a small subset of model parameters. However, performance gaps between PEFT approaches and full-model fine-tuning still exist. In this work, we present DiaBlo, a simple yet effective PEFT approach that updates only the diagonal blocks of selected model weight matrices. Unlike Low Rank Adaptation (LoRA) and its variants, DiaBlo eliminates the need for low rank matrix products, thereby avoiding the reliance on auxiliary initialization schemes or customized optimization strategies to improve convergence. This design leads to stable and robust convergence while maintaining comparable memory efficiency and training speed to LoRA. We conduct extensive experiments across a range of tasks, including commonsense reasoning, arithmetic reasoning, code generation, and safety alignment, to evaluate the effectiveness and efficiency of DiaBlo. Across these benchmarks, DiaBlo demonstrates strong and consistent performance while maintaining high memory efficiency and fast finetuning speed. Codes are available at https://github.com/ziyangjoy/DiaBlo.