🤖 AI Summary
Existing parameter-efficient fine-tuning (PEFT) methods—e.g., LoRA—rely on a global low-rank assumption, limiting their ability to capture local and multi-scale structures in weight updates. To address this, we propose WaRA, a novel wavelet-based PEFT paradigm: it first projects weight updates into the wavelet domain, then performs low-rank decomposition and reconstruction therein, enabling natural multi-resolution modeling. This design jointly enforces sparsity, locality sensitivity, and global consistency, substantially enhancing expressivity for complex update patterns. Experiments demonstrate that WaRA achieves superior performance with lower computational overhead across diverse vision tasks—including image generation, classification, and semantic segmentation. Moreover, it exhibits strong generalization on language tasks, validating its cross-modal applicability.
📝 Abstract
Parameter-efficient fine-tuning (PEFT) has gained widespread adoption across various applications. Among PEFT techniques, Low-Rank Adaptation (LoRA) and its extensions have emerged as particularly effective, allowing efficient model adaptation while significantly reducing computational overhead. However, existing approaches typically rely on global low-rank factorizations, which overlook local or multi-scale structure, failing to capture complex patterns in the weight updates. To address this, we propose WaRA, a novel PEFT method that leverages wavelet transforms to decompose the weight update matrix into a multi-resolution representation. By performing low-rank factorization in the wavelet domain and reconstructing updates through an inverse transform, WaRA obtains compressed adaptation parameters that harness multi-resolution analysis, enabling it to capture both coarse and fine-grained features while providing greater flexibility and sparser representations than standard LoRA. Through comprehensive experiments and analysis, we demonstrate that WaRA performs superior on diverse vision tasks, including image generation, classification, and semantic segmentation, significantly enhancing generated image quality while reducing computational complexity. Although WaRA was primarily designed for vision tasks, we further showcase its effectiveness in language tasks, highlighting its broader applicability and generalizability. The code is publicly available at href{GitHub}{https://github.com/moeinheidari7829/WaRA}.