Multi-Task Dense Prediction Fine-Tuning with Mixture of Fine-Grained Experts

๐Ÿ“… 2025-07-25
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address the challenge of balancing shared representation learning and task specialization in multi-task dense prediction, this paper proposes the Fine-Grained Mixture-of-Experts (FGMoE) architecture. Methodologically, FGMoE (1) partitions intra-task experts along the MLPโ€™s intermediate dimension to enable fine-grained representation disentanglement; (2) introduces cross-context shared experts and a global expert that supports adaptive cross-task knowledge transfer, establishing a three-level collaborative routing mechanism; and (3) integrates a decoder-only parameter-efficient fine-tuning strategy. Evaluated on NYUD-v2 and PASCAL-Context, FGMoE achieves state-of-the-art performance across semantic segmentation, depth estimation, and other dense prediction tasksโ€”while using significantly fewer parameters than existing MoE baselines. These results empirically validate that fine-grained expert design effectively balances shared and task-specific representations, yielding both improved accuracy and strong generalization across heterogeneous dense prediction tasks.

Technology Category

Application Category

๐Ÿ“ Abstract
Multi-task learning (MTL) for dense prediction has shown promising results but still faces challenges in balancing shared representations with task-specific specialization. In this paper, we introduce a novel Fine-Grained Mixture of Experts (FGMoE) architecture that explores MoE-based MTL models through a combination of three key innovations and fine-tuning. First, we propose intra-task experts that partition along intermediate hidden dimensions of MLPs, enabling finer decomposition of task information while maintaining parameter efficiency. Second, we introduce shared experts that consolidate common information across different contexts of the same task, reducing redundancy, and allowing routing experts to focus on unique aspects. Third, we design a global expert that facilitates adaptive knowledge transfer across tasks based on both input feature and task requirements, promoting beneficial information sharing while preventing harmful interference. In addition, we use the fine-tuning approach to improve parameter efficiency only by training the parameters of the decoder. Extensive experimental results show that the proposed FGMoE uses fewer parameters and significantly outperforms current MoE-based competitive MTL models on two dense prediction datasets ( extit{i.e.,} NYUD-v2, PASCAL-Context) in various metrics.
Problem

Research questions and friction points this paper is trying to address.

Balancing shared and task-specific representations in multi-task dense prediction
Improving parameter efficiency with fine-grained mixture of experts
Enhancing knowledge transfer across tasks while preventing interference
Innovation

Methods, ideas, or system contributions that make the work stand out.

Intra-task experts partition hidden MLP dimensions
Shared experts consolidate common task information
Global expert enables adaptive cross-task transfer
๐Ÿ”Ž Similar Papers
No similar papers found.
Y
Yangyang Xu
Department of Computer Science and Technology, Tsinghua University, Beijing, China
X
Xi Ye
Department of Computer Science and Technology, Tsinghua University, Beijing, China
Duo Su
Duo Su
Tsinghua University
Deep LearningComputer Vision