Large-Scale High-Quality 3D Gaussian Head Reconstruction from Multi-View Captures

📅 2026-05-05

📈 Citations: 0

✨ Influential: 0

career value

246K/year

🤖 AI Summary

This work addresses the challenge of efficiently reconstructing high-quality, generalizable 3D head models from large-scale multi-view images. The authors propose HeadsUp, a method based on an encoder–decoder architecture that compresses input images into a compact implicit representation and decodes it into a UV-parameterized set of 3D Gaussians anchored to a neutral head template. This formulation enables feed-forward reconstruction without test-time optimization. By decoupling the number of Gaussians from the input resolution, the representation supports high-resolution multi-view training. Evaluated on a dataset of over ten thousand head scans, HeadsUp achieves state-of-the-art reconstruction quality, demonstrates strong generalization to novel identities, and is successfully applied to 3D identity generation and expression-driven animation.

📝 Abstract

We propose HeadsUp, a scalable feed-forward method for reconstructing high-quality 3D Gaussian heads from large-scale multi-camera setups. Our method employs an efficient encoder-decoder architecture that compresses input views into a compact latent representation. This latent representation is then decoded into a set of UV-parameterized 3D Gaussians anchored to a neutral head template. This UV representation decouples the number of 3D Gaussians from the number and resolution of input images, enabling training with many high-resolution input views. We train and evaluate our model on an internal dataset with more than 10,000 subjects, which is an order of magnitude larger than existing multi-view human head datasets. HeadsUp achieves state-of-the-art reconstruction quality and generalizes to novel identities without test-time optimization. We extensively analyze the scaling behavior of our model across identities, views, and model capacity, revealing practical insights for quality-compute trade-offs. Finally, we highlight the strength of our latent space by showcasing two downstream applications: generating novel 3D identities and animating the 3D heads with expression blendshapes.

Problem

Research questions and friction points this paper is trying to address.

3D Gaussian reconstruction

multi-view capture

scalable 3D head modeling

high-quality 3D reconstruction

generalizable head representation

Innovation

Methods, ideas, or system contributions that make the work stand out.

3D Gaussian Splatting

UV-parameterized representation

multi-view reconstruction