PercHead: Perceptual Head Model for Single-Image 3D Head Reconstruction&Editing

📅 2025-11-04

📈 Citations: 0

✨ Influential: 0

career value

252K/year

🤖 AI Summary

This work addresses key challenges in single-image 3D head reconstruction and semantic editing—namely severe occlusion under arbitrary viewpoints, weak perceptual supervision, and ambiguous 3D semantic editing. We propose an end-to-end framework featuring a dual-branch encoder and ViT-based decoder, enhanced by iterative cross-domain attention for effective 2D-to-3D feature lifting. Geometry and appearance fidelity are ensured via Gaussian splatting rendering. To strengthen reconstruction accuracy, we introduce joint perceptual supervision from DINOv2 and SAM 2.1. Furthermore, we design a semantic-decoupled editing network: segmentation masks control geometric deformation, while text prompts or reference images modulate stylistic attributes—enabling intuitive, controllable 3D semantic editing. Experiments demonstrate state-of-the-art performance on novel-view synthesis, robustness to extreme viewpoints, and support for GUI-driven interactive geometry sculpting and stylization.

Technology Category

Application Category

📝 Abstract

We present PercHead, a method for single-image 3D head reconstruction and semantic 3D editing - two tasks that are inherently challenging due to severe view occlusions, weak perceptual supervision, and the ambiguity of editing in 3D space. We develop a unified base model for reconstructing view-consistent 3D heads from a single input image. The model employs a dual-branch encoder followed by a ViT-based decoder that lifts 2D features into 3D space through iterative cross-attention. Rendering is performed using Gaussian Splatting. At the heart of our approach is a novel perceptual supervision strategy based on DINOv2 and SAM2.1, which provides rich, generalized signals for both geometric and appearance fidelity. Our model achieves state-of-the-art performance in novel-view synthesis and, furthermore, exhibits exceptional robustness to extreme viewing angles compared to established baselines. Furthermore, this base model can be seamlessly extended for semantic 3D editing by swapping the encoder and finetuning the network. In this variant, we disentangle geometry and style through two distinct input modalities: a segmentation map to control geometry and either a text prompt or a reference image to specify appearance. We highlight the intuitive and powerful 3D editing capabilities of our model through a lightweight, interactive GUI, where users can effortlessly sculpt geometry by drawing segmentation maps and stylize appearance via natural language or image prompts. Project Page: https://antoniooroz.github.io/PercHead Video: https://www.youtube.com/watch?v=4hFybgTk4kE

Problem

Research questions and friction points this paper is trying to address.

Reconstructing 3D heads from single images with view consistency

Enabling semantic 3D editing via geometry and appearance disentanglement

Overcoming occlusion and perceptual ambiguity in 3D reconstruction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-branch encoder with ViT-based decoder

Perceptual supervision using DINOv2 and SAM2

Geometry and style disentanglement for editing

🔎 Similar Papers

No similar papers found.