Efficient Label Refinement for Face Parsing Under Extreme Poses Using 3D Gaussian Splatting

📅 2025-10-09

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

Face parsing under extreme poses is severely limited by the scarcity of annotated training data. To address this, we propose the first multi-view consistent label optimization framework based on 3D Gaussian Splatting (3DGS): without requiring ground-truth 3D annotations, it jointly optimizes RGB images and initial segmentation masks to reconstruct a shared geometric representation, then renders high-fidelity, pose-consistent segmentation labels across multiple views; these synthetic labels are subsequently used to fine-tune the parsing model. Our method requires only a small set of initially annotated images yet generates high-fidelity, diverse training data. It significantly improves parsing accuracy under extreme poses while preserving performance on standard viewpoints, consistently outperforming state-of-the-art methods across quantitative metrics and human evaluation.

Technology Category

Application Category

📝 Abstract

Accurate face parsing under extreme viewing angles remains a significant challenge due to limited labeled data in such poses. Manual annotation is costly and often impractical at scale. We propose a novel label refinement pipeline that leverages 3D Gaussian Splatting (3DGS) to generate accurate segmentation masks from noisy multiview predictions. By jointly fitting two 3DGS models, one to RGB images and one to their initial segmentation maps, our method enforces multiview consistency through shared geometry, enabling the synthesis of pose-diverse training data with only minimal post-processing. Fine-tuning a face parsing model on this refined dataset significantly improves accuracy on challenging head poses, while maintaining strong performance on standard views. Extensive experiments, including human evaluations, demonstrate that our approach achieves superior results compared to state-of-the-art methods, despite requiring no ground-truth 3D annotations and using only a small set of initial images. Our method offers a scalable and effective solution for improving face parsing robustness in real- world settings.

Problem

Research questions and friction points this paper is trying to address.

Addresses face parsing challenges under extreme head poses

Refines noisy segmentation labels using 3D Gaussian Splatting

Generates pose-diverse training data without ground-truth 3D annotations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging 3D Gaussian Splatting for label refinement

Jointly fitting RGB and segmentation models for consistency

Generating pose-diverse training data without 3D annotations

🔎 Similar Papers

No similar papers found.