TuckA: Hierarchical Compact Tensor Experts for Efficient Fine-Tuning

📅 2025-11-10

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

Conventional single-expert parameter-efficient fine-tuning (PEFT) methods struggle to capture task-data diversity, limiting expressivity and adaptability. Method: This paper proposes TuckA, a multi-expert PEFT framework based on Tucker decomposition. It constructs a compact three-way tensor, where each frontal slice serves as an independent expert; hierarchical expert grouping, batch-level dynamic routing, and data-aware initialization are introduced to ensure load balancing and efficient parameter expansion. Contribution/Results: Compared with standard low-rank adaptation (LoRA), TuckA achieves significantly enhanced model expressivity and task adaptability while maintaining comparable parameter counts. Extensive experiments demonstrate state-of-the-art performance across diverse benchmarks—including natural language understanding, image classification, and mathematical reasoning—outperforming leading PEFT approaches. These results validate TuckA’s effectiveness, generalizability, and scalability across modalities and tasks.

Technology Category

Application Category

📝 Abstract

Efficiently fine-tuning pre-trained models for downstream tasks is a key challenge in the era of foundation models. Parameter-efficient fine-tuning (PEFT) presents a promising solution, achieving performance comparable to full fine-tuning by updating only a small number of adaptation weights per layer. Traditional PEFT methods typically rely on a single expert, where the adaptation weight is a low-rank matrix. However, for complex tasks, the data's inherent diversity poses a significant challenge for such models, as a single adaptation weight cannot adequately capture the features of all samples. To address this limitation, we explore how to integrate multiple small adaptation experts into a compact structure to defeat a large adapter. Specifically, we propose Tucker Adaptation (TuckA), a method with four key properties: (i) We use Tucker decomposition to create a compact 3D tensor where each slice naturally serves as an expert. The low-rank nature of this decomposition ensures that the number of parameters scales efficiently as more experts are added. (ii) We introduce a hierarchical strategy that organizes these experts into groups at different granularities, allowing the model to capture both local and global data patterns. (iii) We develop an efficient batch-level routing mechanism, which reduces the router's parameter size by a factor of $L$ compared to routing at every adapted layer (where $L$ is the number of adapted layers) (iv) We propose data-aware initialization to achieve loss-free expert load balancing based on theoretical analysis. Extensive experiments on benchmarks in natural language understanding, image classification, and mathematical reasoning speak to the efficacy of TuckA, offering a new and effective solution to the PEFT problem.

Problem

Research questions and friction points this paper is trying to address.

Efficient fine-tuning for complex downstream tasks

Single expert limitation in parameter-efficient fine-tuning

Hierarchical compact tensor experts for diverse data patterns

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Tucker decomposition for compact 3D tensor experts

Implements hierarchical grouping for multi-granular pattern capture

Employs efficient batch-level routing with data-aware initialization

🔎 Similar Papers

Revisiting Prefix-tuning: Statistical Benefits of Reparameterization among Prompts