ScaffoldAvatar: High-Fidelity Gaussian Avatars with Patch Expressions

📅 2025-07-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of modeling subtle facial expressions and skin details while maintaining real-time rendering efficiency in high-fidelity 3D head avatars, this paper proposes Patch-GS—a hierarchical framework integrating patch-wise expression modeling with 3D Gaussian Splatting. Its core innovation is the first construction of a patch-based expression latent space, replacing conventional global representations to enable precise, localized control of fine facial motions. Patch-GS further incorporates Scaffold-GS scene representation, patch-level geometric modeling, color-aware densification, and a progressive training strategy, enabling real-time (≥30 FPS) rendering at high resolution (3K). Experiments demonstrate significant improvements over state-of-the-art methods in reconstruction fidelity, motion naturalness, and training convergence speed. The method exhibits strong practicality for immersive telepresence and film production applications.

Technology Category

Application Category

📝 Abstract
Generating high-fidelity real-time animated sequences of photorealistic 3D head avatars is important for many graphics applications, including immersive telepresence and movies. This is a challenging problem particularly when rendering digital avatar close-ups for showing character's facial microfeatures and expressions. To capture the expressive, detailed nature of human heads, including skin furrowing and finer-scale facial movements, we propose to couple locally-defined facial expressions with 3D Gaussian splatting to enable creating ultra-high fidelity, expressive and photorealistic 3D head avatars. In contrast to previous works that operate on a global expression space, we condition our avatar's dynamics on patch-based local expression features and synthesize 3D Gaussians at a patch level. In particular, we leverage a patch-based geometric 3D face model to extract patch expressions and learn how to translate these into local dynamic skin appearance and motion by coupling the patches with anchor points of Scaffold-GS, a recent hierarchical scene representation. These anchors are then used to synthesize 3D Gaussians on-the-fly, conditioned by patch-expressions and viewing direction. We employ color-based densification and progressive training to obtain high-quality results and faster convergence for high resolution 3K training images. By leveraging patch-level expressions, ScaffoldAvatar consistently achieves state-of-the-art performance with visually natural motion, while encompassing diverse facial expressions and styles in real time.
Problem

Research questions and friction points this paper is trying to address.

Generate high-fidelity 3D head avatars for immersive applications
Render detailed facial microfeatures and expressions realistically
Enable real-time synthesis of diverse facial expressions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Patch-based local expression features for avatars
3D Gaussian splatting for ultra-high fidelity
Scaffold-GS anchors for dynamic skin appearance
🔎 Similar Papers
No similar papers found.
S
Shivangi Aneja
Technical University of Munich, Germany and DisneyResearch|Studios, Switzerland
Sebastian Weiss
Sebastian Weiss
Disney Research Zürich
computer visualization and graphicsdeep learning
I
Irene Baeza
DisneyResearch|Studios, Switzerland
P
Prashanth Chandran
DisneyResearch|Studios, Switzerland
G
Gaspard Zoss
DisneyResearch|Studios, Switzerland
Matthias Nießner
Matthias Nießner
Professor of Computer Science, Technical University of Munich
Computer GraphicsComputer VisionArtificial IntelligenceMachine Learning
Derek Bradley
Derek Bradley
DisneyResearch|Studios
Computer GraphicsComputer Vision