Efficient Label Refinement for Face Parsing Under Extreme Poses Using 3D Gaussian Splatting

📅 2025-10-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Face parsing under extreme poses is severely limited by the scarcity of annotated training data. To address this, we propose the first multi-view consistent label optimization framework based on 3D Gaussian Splatting (3DGS): without requiring ground-truth 3D annotations, it jointly optimizes RGB images and initial segmentation masks to reconstruct a shared geometric representation, then renders high-fidelity, pose-consistent segmentation labels across multiple views; these synthetic labels are subsequently used to fine-tune the parsing model. Our method requires only a small set of initially annotated images yet generates high-fidelity, diverse training data. It significantly improves parsing accuracy under extreme poses while preserving performance on standard viewpoints, consistently outperforming state-of-the-art methods across quantitative metrics and human evaluation.

Technology Category

Application Category

📝 Abstract
Accurate face parsing under extreme viewing angles remains a significant challenge due to limited labeled data in such poses. Manual annotation is costly and often impractical at scale. We propose a novel label refinement pipeline that leverages 3D Gaussian Splatting (3DGS) to generate accurate segmentation masks from noisy multiview predictions. By jointly fitting two 3DGS models, one to RGB images and one to their initial segmentation maps, our method enforces multiview consistency through shared geometry, enabling the synthesis of pose-diverse training data with only minimal post-processing. Fine-tuning a face parsing model on this refined dataset significantly improves accuracy on challenging head poses, while maintaining strong performance on standard views. Extensive experiments, including human evaluations, demonstrate that our approach achieves superior results compared to state-of-the-art methods, despite requiring no ground-truth 3D annotations and using only a small set of initial images. Our method offers a scalable and effective solution for improving face parsing robustness in real- world settings.
Problem

Research questions and friction points this paper is trying to address.

Addresses face parsing challenges under extreme head poses
Refines noisy segmentation labels using 3D Gaussian Splatting
Generates pose-diverse training data without ground-truth 3D annotations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging 3D Gaussian Splatting for label refinement
Jointly fitting RGB and segmentation models for consistency
Generating pose-diverse training data without 3D annotations
🔎 Similar Papers
No similar papers found.
A
Ankit Gahlawat
International Institute of Information Technology, Bangalore (IIIT-B), Bengaluru, India
A
Anirban Mukherjee
International Institute of Information Technology, Bangalore (IIIT-B), Bengaluru, India
Dinesh Babu Jayagopi
Dinesh Babu Jayagopi
Professor, HoD DSAI, National Teachers Awardee, IIIT Bangalore
Multimodal signal processingApplied Machine LearningSocial ComputingBehavior Analytics