FOCUS: Towards Universal Foreground Segmentation

📅 2025-01-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address task fragmentation, insufficient background semantic modeling, and ambiguous boundary delineation in generic foreground segmentation, this paper proposes the first unified framework explicitly incorporating background semantics. Methodologically, we design an edge-guided multi-scale semantic network to enhance boundary awareness; introduce a multimodal contrastive distillation mechanism that integrates vision-language priors to strengthen foreground-background semantic relationships; and develop a unified decoder enabling seamless multi-task compatibility. Evaluated across 13 diverse datasets and five foreground segmentation tasks—including portrait, object, shadow, reflection, and transparent object segmentation—our approach consistently surpasses single-task state-of-the-art methods. It achieves significant average improvements across standard metrics, demonstrating both strong generalization capability and technical advancement in unified foreground segmentation.

Technology Category

Application Category

📝 Abstract
Foreground segmentation is a fundamental task in computer vision, encompassing various subdivision tasks. Previous research has typically designed task-specific architectures for each task, leading to a lack of unification. Moreover, they primarily focus on recognizing foreground objects without effectively distinguishing them from the background. In this paper, we emphasize the importance of the background and its relationship with the foreground. We introduce FOCUS, the Foreground ObjeCts Universal Segmentation framework that can handle multiple foreground tasks. We develop a multi-scale semantic network using the edge information of objects to enhance image features. To achieve boundary-aware segmentation, we propose a novel distillation method, integrating the contrastive learning strategy to refine the prediction mask in multi-modal feature space. We conduct extensive experiments on a total of 13 datasets across 5 tasks, and the results demonstrate that FOCUS consistently outperforms the state-of-the-art task-specific models on most metrics.
Problem

Research questions and friction points this paper is trying to address.

Image Segmentation
Object Recognition
Foreground-Background Discrimination
Innovation

Methods, ideas, or system contributions that make the work stand out.

FOCUS method
Edge information utilization
Enhanced learning strategy
🔎 Similar Papers
No similar papers found.
Zuyao You
Zuyao You
Fudan University
Computer VisionLarge Multimodal Models
L
Lingyu Kong
Shanghai Key Lab of Intell. Info. Processing, School of CS, Fudan University; Shanghai Collaborative Innovation Center of Intelligent Visual Computing
Lingchen Meng
Lingchen Meng
Qwen Team, Alibaba Group; Fudan University
Large Multimodal Models
Zuxuan Wu
Zuxuan Wu
Fudan University