๐ค AI Summary
To address the challenge of balancing shared representation learning and task specialization in multi-task dense prediction, this paper proposes the Fine-Grained Mixture-of-Experts (FGMoE) architecture. Methodologically, FGMoE (1) partitions intra-task experts along the MLPโs intermediate dimension to enable fine-grained representation disentanglement; (2) introduces cross-context shared experts and a global expert that supports adaptive cross-task knowledge transfer, establishing a three-level collaborative routing mechanism; and (3) integrates a decoder-only parameter-efficient fine-tuning strategy. Evaluated on NYUD-v2 and PASCAL-Context, FGMoE achieves state-of-the-art performance across semantic segmentation, depth estimation, and other dense prediction tasksโwhile using significantly fewer parameters than existing MoE baselines. These results empirically validate that fine-grained expert design effectively balances shared and task-specific representations, yielding both improved accuracy and strong generalization across heterogeneous dense prediction tasks.
๐ Abstract
Multi-task learning (MTL) for dense prediction has shown promising results but still faces challenges in balancing shared representations with task-specific specialization. In this paper, we introduce a novel Fine-Grained Mixture of Experts (FGMoE) architecture that explores MoE-based MTL models through a combination of three key innovations and fine-tuning. First, we propose intra-task experts that partition along intermediate hidden dimensions of MLPs, enabling finer decomposition of task information while maintaining parameter efficiency. Second, we introduce shared experts that consolidate common information across different contexts of the same task, reducing redundancy, and allowing routing experts to focus on unique aspects. Third, we design a global expert that facilitates adaptive knowledge transfer across tasks based on both input feature and task requirements, promoting beneficial information sharing while preventing harmful interference. In addition, we use the fine-tuning approach to improve parameter efficiency only by training the parameters of the decoder. Extensive experimental results show that the proposed FGMoE uses fewer parameters and significantly outperforms current MoE-based competitive MTL models on two dense prediction datasets ( extit{i.e.,} NYUD-v2, PASCAL-Context) in various metrics.