ModuSeg: Decoupling Object Discovery and Semantic Retrieval for Training-Free Weakly Supervised Segmentation

📅 2026-04-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Weakly supervised semantic segmentation often suffers from coupling between semantic recognition and object localization, leading models to focus on sparse discriminative regions and fail to produce complete, precise masks. To address this, this work proposes the first training-free decoupling framework: it leverages a generic mask proposer to generate geometric candidate regions and constructs an offline feature bank using a foundation semantic model, thereby reformulating segmentation as a non-parametric feature retrieval task. By explicitly separating object discovery from semantic assignment, the approach avoids pseudo-label noise and eliminates the need for complex retraining. Furthermore, semantic boundary refinement and soft mask feature aggregation are introduced to enhance prototype quality. The method achieves competitive performance on standard benchmarks, preserving fine boundaries without any fine-tuning and significantly outperforming existing training-dependent approaches.
📝 Abstract
Weakly supervised semantic segmentation aims to achieve pixel-level predictions using image-level labels. Existing methods typically entangle semantic recognition and object localization, which often leads models to focus exclusively on sparse discriminative regions. Although foundation models show immense potential, many approaches still follow the tightly coupled optimization paradigm, struggling to effectively alleviate pseudo-label noise and often relying on time-consuming multi-stage retraining or unstable end-to-end joint optimization. To address the above challenges, we present ModuSeg, a training-free weakly supervised semantic segmentation framework centered on explicitly decoupling object discovery and semantic assignment. Specifically, we integrate a general mask proposer to extract geometric proposals with reliable boundaries, while leveraging semantic foundation models to construct an offline feature bank, transforming segmentation into a non-parametric feature retrieval process. Furthermore, we propose semantic boundary purification and soft-masked feature aggregation strategies to effectively mitigate boundary ambiguity and quantization errors, thereby extracting high-quality category prototypes. Extensive experiments demonstrate that the proposed decoupled architecture better preserves fine boundaries without parameter fine-tuning and achieves highly competitive performance on standard benchmark datasets. Code is available at https://github.com/Autumnair007/ModuSeg.
Problem

Research questions and friction points this paper is trying to address.

weakly supervised segmentation
object discovery
semantic retrieval
pseudo-label noise
decoupling
Innovation

Methods, ideas, or system contributions that make the work stand out.

decoupling
training-free
weakly supervised segmentation
feature retrieval
foundation models
🔎 Similar Papers
No similar papers found.
Q
Qingze He
South China University of Technology
F
Fagui Liu
South China University of Technology, Pengcheng Laboratory
Dengke Zhang
Dengke Zhang
second-year PhD student at South China University of Technology
Computer VisionImage Segmentation
Q
Qingmao Wei
South China University of Technology, Pengcheng Laboratory
Quan Tang
Quan Tang
Pengcheng Laboratory
Computer VisionAnomaly DetectionDeep Learning