DisFaceRep: Representation Disentanglement for Co-occurring Facial Components in Weakly Supervised Face Parsing

📅 2025-08-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Weakly supervised facial parsing (WSFP) methods rely solely on image-level labels and natural language descriptions to reduce annotation costs; however, high co-occurrence and visual similarity among facial components lead to ambiguous activations and suboptimal segmentation performance. To address this, we propose an explicit–implicit disentanglement framework: (1) a co-occurrence-aware decoupling strategy mitigates dataset bias by explicitly modeling component correlations, and (2) a text-guided disentanglement loss leverages linguistic priors to enhance semantic separation of facial parts. Our approach unifies weakly supervised semantic segmentation, representation disentanglement, and multimodal joint supervision. Extensive experiments on CelebAMask-HQ, LaPa, and Helen demonstrate significant improvements over state-of-the-art WSFP methods, validating the effectiveness of our disentanglement mechanism in enhancing localization accuracy and part discrimination capability.

Technology Category

Application Category

📝 Abstract
Face parsing aims to segment facial images into key components such as eyes, lips, and eyebrows. While existing methods rely on dense pixel-level annotations, such annotations are expensive and labor-intensive to obtain. To reduce annotation cost, we introduce Weakly Supervised Face Parsing (WSFP), a new task setting that performs dense facial component segmentation using only weak supervision, such as image-level labels and natural language descriptions. WSFP introduces unique challenges due to the high co-occurrence and visual similarity of facial components, which lead to ambiguous activations and degraded parsing performance. To address this, we propose DisFaceRep, a representation disentanglement framework designed to separate co-occurring facial components through both explicit and implicit mechanisms. Specifically, we introduce a co-occurring component disentanglement strategy to explicitly reduce dataset-level bias, and a text-guided component disentanglement loss to guide component separation using language supervision implicitly. Extensive experiments on CelebAMask-HQ, LaPa, and Helen demonstrate the difficulty of WSFP and the effectiveness of DisFaceRep, which significantly outperforms existing weakly supervised semantic segmentation methods. The code will be released at href{https://github.com/CVI-SZU/DisFaceRep}{ extcolor{cyan}{https://github.com/CVI-SZU/DisFaceRep}}.
Problem

Research questions and friction points this paper is trying to address.

Reducing annotation cost for face parsing with weak supervision
Separating co-occurring facial components to improve parsing accuracy
Leveraging text guidance to disentangle visually similar facial features
Innovation

Methods, ideas, or system contributions that make the work stand out.

Weakly supervised face parsing with image-level labels
Disentanglement framework for co-occurring facial components
Text-guided loss for implicit component separation
🔎 Similar Papers
No similar papers found.
X
Xiaoqin Wang
School of Computer Science and Software Engineering, Shenzhen University
Xianxu Hou
Xianxu Hou
Xi'an Jiaotong-Liverpool University
Deep LearningComputer Vision
Meidan Ding
Meidan Ding
Shenzhen university
computer visionmedical image analysis
J
Junliang Chen
Department of Electrical and Electronic Engineering, The Hong Kong Polytechnic University
K
Kaijun Deng
School of Computer Science and Software Engineering, Shenzhen University
Jinheng Xie
Jinheng Xie
National University of Singapore
Deep LearningComputer VisionGenerative AI
Linlin Shen
Linlin Shen
Shenzhen University
Deep LearningComputer VisionFacial Analysis/RecognitionMedical Image Analysis