Coeff-Tuning: A Graph Filter Subspace View for Tuning Attention-Based Large Models

📅 2025-03-24

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

To address the high computational and memory overhead in efficient fine-tuning of large language models, this paper proposes a graph signal processing–inspired attention mechanism reconstruction method. Specifically, it models multi-head self-attention as a graph convolutional filter subspace and dynamically composes expressive attention subspaces by learning a small set of trainable coefficients over fixed basis filters. This work is the first to reinterpret Transformer attention from the perspective of graph filter subspaces. It further introduces residual coefficient parameterization and filter-level dropout to enhance training stability and generalization. The method incurs negligible additional parameters (<0.1% of the base model), significantly outperforms mainstream parameter-efficient fine-tuning (PEFT) approaches—including LoRA and Adapter—achieving state-of-the-art performance across multiple downstream tasks. Moreover, it supports plug-and-play integration without architectural modification.

Technology Category

Application Category

📝 Abstract

Transformer-based large pre-trained models have shown remarkable generalization ability, and various parameter-efficient fine-tuning (PEFT) methods have been proposed to customize these models on downstream tasks with minimal computational and memory budgets. Previous PEFT methods are primarily designed from a tensor-decomposition perspective that tries to effectively tune the linear transformation by finding the smallest subset of parameters to train. Our study adopts an orthogonal view by representing the attention operation as a graph convolution and formulating the multi-head attention maps as a convolutional filter subspace, with each attention map as a subspace element. In this paper, we propose to tune the large pre-trained transformers by learning a small set of combination coefficients that construct a more expressive filter subspace from the original multi-head attention maps. We show analytically and experimentally that the tuned filter subspace can effectively expand the feature space of the multi-head attention and further enhance the capacity of transformers. We further stabilize the fine-tuning with a residual parameterization of the tunable subspace coefficients, and enhance the generalization with a regularization design by directly applying dropout on the tunable coefficient during training. The tunable coefficients take a tiny number of parameters and can be combined with previous PEFT methods in a plug-and-play manner. Extensive experiments show that our approach achieves superior performances than PEFT baselines with neglectable additional parameters.

Problem

Research questions and friction points this paper is trying to address.

Tuning large pre-trained transformers via graph filter subspace

Enhancing transformer capacity with expressive filter subspace

Stabilizing fine-tuning with residual parameterization and regularization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Represents attention as graph convolution

Learns coefficients for expressive filter subspace

Uses residual parameterization and dropout regularization

🔎 Similar Papers

Step-by-Step Unmasking for Parameter-Efficient Fine-tuning of Large Language Models