Multi-Task Dense Prediction Fine-Tuning with Mixture of Fine-Grained Experts

📅 2025-07-25

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

To address the challenge of balancing shared representation learning and task specialization in multi-task dense prediction, this paper proposes the Fine-Grained Mixture-of-Experts (FGMoE) architecture. Methodologically, FGMoE (1) partitions intra-task experts along the MLP’s intermediate dimension to enable fine-grained representation disentanglement; (2) introduces cross-context shared experts and a global expert that supports adaptive cross-task knowledge transfer, establishing a three-level collaborative routing mechanism; and (3) integrates a decoder-only parameter-efficient fine-tuning strategy. Evaluated on NYUD-v2 and PASCAL-Context, FGMoE achieves state-of-the-art performance across semantic segmentation, depth estimation, and other dense prediction tasks—while using significantly fewer parameters than existing MoE baselines. These results empirically validate that fine-grained expert design effectively balances shared and task-specific representations, yielding both improved accuracy and strong generalization across heterogeneous dense prediction tasks.

Technology Category

Application Category

📝 Abstract

Multi-task learning (MTL) for dense prediction has shown promising results but still faces challenges in balancing shared representations with task-specific specialization. In this paper, we introduce a novel Fine-Grained Mixture of Experts (FGMoE) architecture that explores MoE-based MTL models through a combination of three key innovations and fine-tuning. First, we propose intra-task experts that partition along intermediate hidden dimensions of MLPs, enabling finer decomposition of task information while maintaining parameter efficiency. Second, we introduce shared experts that consolidate common information across different contexts of the same task, reducing redundancy, and allowing routing experts to focus on unique aspects. Third, we design a global expert that facilitates adaptive knowledge transfer across tasks based on both input feature and task requirements, promoting beneficial information sharing while preventing harmful interference. In addition, we use the fine-tuning approach to improve parameter efficiency only by training the parameters of the decoder. Extensive experimental results show that the proposed FGMoE uses fewer parameters and significantly outperforms current MoE-based competitive MTL models on two dense prediction datasets ( extit{i.e.,} NYUD-v2, PASCAL-Context) in various metrics.

Problem

Research questions and friction points this paper is trying to address.

Balancing shared and task-specific representations in multi-task dense prediction

Improving parameter efficiency with fine-grained mixture of experts

Enhancing knowledge transfer across tasks while preventing interference

Innovation

Methods, ideas, or system contributions that make the work stand out.

Intra-task experts partition hidden MLP dimensions

Shared experts consolidate common task information

Global expert enables adaptive cross-task transfer

🔎 Similar Papers

T-REX: Mixture-of-Rank-One-Experts with Semantic-aware Intuition for Multi-task Large Language Model Finetuning