Controllable-LPMoE: Adapting to Challenging Object Segmentation via Dynamic Local Priors from Mixture-of-Experts

πŸ“… 2025-10-23
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Large-scale foundation models face prohibitive computational costs when fully fine-tuned for object segmentation, while existing prompt-based tuning methods lack sufficient semantic priors. To address this, we propose an efficient frozen-tuning framework grounded in dynamic local priors. Our method introduces a dynamic hybrid local prior extractor and a bidirectional interaction adapter, integrating heterogeneous convolutions, gated routing, and cosine-aligned deformable attention to adaptively generate semantically rich local priors. Additionally, we incorporate a dynamic Mixture-of-Experts (MoE) mechanism to modulate the frozen model’s local perception capability. With fewer than 0.1% trainable parameters, our approach surpasses 31 state-of-the-art methods across multiple binary segmentation benchmarks, achieving significant improvements in accuracy, generalization, and training efficiency. The code is publicly available.

Technology Category

Application Category

πŸ“ Abstract
Large-scale foundation models provide powerful feature representations for downstream object segmentation tasks. However, when adapted to specific tasks through the full-parameter fine-tuning, the enormous parameters being updated often results in significant computational overhead, creating a bottleneck in training efficiency. Although existing methods attempt to fine-tune frozen models by directly embedding trainable prompts, these prompts lack inherent semantic priors, limiting the adaptability of large-scale models. In this paper, we propose a novel dynamic priors-based fine-tuning paradigm with fewer trainable parameters, dubbed Controllable-LPMoE, which adaptively modulates frozen foundation models by dynamically controlling local priors to enhance fine-grained perception for specific segmentation tasks. More specifically, we construct a lightweight dynamic mixed local priors extractor that captures diverse local priors from input images through heterogeneous convolutions while employing a gating network to dynamically output expert priors required for the subsequent fine-tuning. Furthermore, we design a bi-directional interaction adapter that employs cosine-aligned deformable attention and channel-oriented adaptive scale enhancement to interact and restructure between frozen and trainable features, achieving efficient fine-tuning. Extensive experiments validate the superiority of our href{https://github.com/CSYSI/Controllable-LPMoE} {Controllable-LPMoE} approach, demonstrating excellent segmentation performance compared to 31 state-of-the-art (SOTA) methods and adaptability to multiple binary object segmentation tasks.
Problem

Research questions and friction points this paper is trying to address.

Reducing computational overhead in large model fine-tuning for segmentation
Enhancing fine-grained perception through dynamic local priors
Improving adaptability to challenging binary object segmentation tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic local priors adaptively modulate frozen foundation models
Lightweight extractor captures diverse priors via heterogeneous convolutions
Bi-directional adapter interacts features using deformable attention mechanisms
πŸ”Ž Similar Papers
No similar papers found.
Y
Yanguang Sun
PCA Lab, Nanjing University of Science and Technology, Nanjing, China
Jiawei Lian
Jiawei Lian
xxxst
3d visionWeakly/Self supervised
J
Jian Yang
PCA Lab, VCIP, College of Computer Science, Nankai University, Tianjin, China
Lei Luo
Lei Luo
Kansas State University
Computer VisionGANsImage Restoration