ToonifyGB: StyleGAN-based Gaussian Blendshapes for 3D Stylized Head Avatars

📅 2025-05-15

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

This paper addresses the challenge of high-fidelity, multi-style 3D avatar generation from monocular driving videos. Methodologically, it introduces a two-stage Toonify framework: (1) an enhanced StyleGAN—robust to misaligned inputs—first synthesizes stylized video sequences; subsequently, a disentangled learning stage recovers a stylized neutral head model and expression-dependent Gaussian blendshapes; (2) neural radiance fields (NeRF) are integrated with temporal consistency optimization to eliminate reliance on rigid cropping pre-processing and significantly improve high-frequency detail reconstruction. Crucially, this work achieves the first end-to-end joint optimization of StyleGAN-based image stylization and 3D Gaussian blendshape modeling. Evaluated on standard benchmarks, Toonify substantially enhances geometric fidelity and expression naturalness, enabling real-time rendering across diverse artistic styles—including Arcane and Pixar—while outperforming all existing baselines in comprehensive metrics.

Technology Category

Application Category

📝 Abstract

The introduction of 3D Gaussian blendshapes has enabled the real-time reconstruction of animatable head avatars from monocular video. Toonify, a StyleGAN-based framework, has become widely used for facial image stylization. To extend Toonify for synthesizing diverse stylized 3D head avatars using Gaussian blendshapes, we propose an efficient two-stage framework, ToonifyGB. In Stage 1 (stylized video generation), we employ an improved StyleGAN to generate the stylized video from the input video frames, which addresses the limitation of cropping aligned faces at a fixed resolution as preprocessing for normal StyleGAN. This process provides a more stable video, which enables Gaussian blendshapes to better capture the high-frequency details of the video frames, and efficiently generate high-quality animation in the next stage. In Stage 2 (Gaussian blendshapes synthesis), we learn a stylized neutral head model and a set of expression blendshapes from the generated video. By combining the neutral head model with expression blendshapes, ToonifyGB can efficiently render stylized avatars with arbitrary expressions. We validate the effectiveness of ToonifyGB on the benchmark dataset using two styles: Arcane and Pixar.

Problem

Research questions and friction points this paper is trying to address.

Extend Toonify for 3D stylized head avatars

Improve StyleGAN for stable stylized video generation

Synthesize Gaussian blendshapes for expressive avatar animation

Innovation

Methods, ideas, or system contributions that make the work stand out.

StyleGAN-based stylized video generation

Gaussian blendshapes for 3D avatars

Two-stage framework for animation

🔎 Similar Papers

No similar papers found.