Toward Real-World High-Precision Image Matting and Segmentation

๐Ÿ“… 2026-01-17
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the limitations of current high-precision image matting methods in real-world scenarios, where assumptions of single-object scenes, poor category generalization, and domain shift from synthetic training data hinder the recovery of fine structures such as hair strands. To overcome these challenges, we propose the Foreground Consistency Learning Model (FCLM), which uniquely integrates depth-aware knowledge distillation with domain-invariant learning to alleviate data scarcity and domain shift. Furthermore, we introduce an object-oriented decoder that supports visionโ€“language multimodal prompts, enabling category-aware interactive segmentation. Extensive experiments demonstrate that our approach significantly outperforms existing methods across multiple benchmarks, achieving superior accuracy and generalization in capturing fine-grained details under complex real-world conditions.

Technology Category

Application Category

๐Ÿ“ Abstract
High-precision scene parsing tasks, including image matting and dichotomous segmentation, aim to accurately predict masks with extremely fine details (such as hair). Most existing methods focus on salient, single foreground objects. While interactive methods allow for target adjustment, their class-agnostic design restricts generalization across different categories. Furthermore, the scarcity of high-quality annotation has led to a reliance on inharmonious synthetic data, resulting in poor generalization to real-world scenarios. To this end, we propose a Foreground Consistent Learning model, dubbed as FCLM, to address the aforementioned issues. Specifically, we first introduce a Depth-Aware Distillation strategy where we transfer the depth-related knowledge for better foreground representation. Considering the data dilemma, we term the processing of synthetic data as domain adaptation problem where we propose a domain-invariant learning strategy to focus on foreground learning. To support interactive prediction, we contribute an Object-Oriented Decoder that can receive both visual and language prompts to predict the referring target. Experimental results show that our method quantitatively and qualitatively outperforms SOTA methods.
Problem

Research questions and friction points this paper is trying to address.

image matting
dichotomous segmentation
real-world generalization
synthetic data
interactive segmentation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Foreground Consistent Learning
Depth-Aware Distillation
Domain-Invariant Learning
Object-Oriented Decoder
Interactive Image Matting
๐Ÿ”Ž Similar Papers
No similar papers found.