Hyper Compressed Fine-Tuning of Large Foundation Models with Quantum Inspired Adapters

📅 2025-02-10

📈 Citations: 0

✨ Influential: 0

career value

237K/year

🤖 AI Summary

To address the prohibitive computational and memory overhead of full-parameter fine-tuning for large language models, this paper proposes Quantum-inspired Adapter (QAdapter), the first method to deeply integrate Hamming-weight-preserving structure, orthogonality constraints, and matrix composition mechanisms into a low-dimensional adapter module enabling multi-order feature coupling. Crucially, QAdapter rigorously maintains parameter orthogonality while substantially enhancing representational capacity and generalization. Experiments demonstrate that QAdapter achieves 99.2% of full fine-tuning performance on GLUE and VTAB benchmarks, using only 1/44 the parameters of LoRA. Moreover, with merely 1/25 the parameters of OFT/BOFT, it attains 98% of their relative performance. This work establishes a novel paradigm for efficient large-model adaptation by unifying structural invariance, geometric constraints, and compositional expressivity in lightweight parameter-efficient tuning.

Technology Category

Application Category

📝 Abstract

Fine-tuning pre-trained large foundation models for specific tasks has become increasingly challenging due to the computational and storage demands associated with full parameter updates. Parameter-Efficient Fine-Tuning (PEFT) methods address this issue by updating only a small subset of model parameters using adapter modules. In this work, we propose emph{Quantum-Inspired Adapters}, a PEFT approach inspired by Hamming-weight preserving quantum circuits from quantum machine learning literature. These models can be both expressive and parameter-efficient by operating in a combinatorially large space while simultaneously preserving orthogonality in weight parameters. We test our proposed adapters by adapting large language models and large vision transformers on benchmark datasets. Our method can achieve 99.2% of the performance of existing fine-tuning methods such LoRA with a 44x parameter compression on language understanding datasets like GLUE and VTAB. Compared to existing orthogonal fine-tuning methods such as OFT or BOFT, we achieve 98% relative performance with 25x fewer parameters. This demonstrates competitive performance paired with a significant reduction in trainable parameters. Through ablation studies, we determine that combining multiple Hamming-weight orders with orthogonality and matrix compounding are essential for performant fine-tuning. Our findings suggest that Quantum-Inspired Adapters offer a promising direction for efficient adaptation of language and vision models in resource-constrained environments.

Problem

Research questions and friction points this paper is trying to address.

Efficient fine-tuning of large models

Reduced computational and storage demands

Quantum-inspired adapters for parameter compression

Innovation

Methods, ideas, or system contributions that make the work stand out.

Quantum-Inspired Adapters

Parameter-Efficient Fine-Tuning

Hamming-weight preserving circuits

🔎 Similar Papers

MCNC: Manifold-Constrained Reparameterization for Neural Compression