🤖 AI Summary
This work proposes an autoregressive framework for generating high-quality, animatable 4D Gaussian avatars from a single portrait image. Built upon a decoder-only Transformer, the method sequentially generates 3D Gaussian point clouds while simultaneously predicting skeletal binding weights, enabling adaptive modeling of complex geometry and appearance. A latent-feature-conditioned Gaussian decoder then recovers full rendering attributes. To the best of our knowledge, this is the first approach to introduce autoregressive sequence modeling to 4D avatar generation, allowing dynamic adjustment of point density according to object complexity and significantly enhancing reconstruction fidelity through inter-stage feature interaction. Experiments demonstrate that the proposed method outperforms existing techniques in both visual quality and controllability.
📝 Abstract
We introduce AvatarPointillist, a novel framework for generating dynamic 4D Gaussian avatars from a single portrait image. At the core of our method is a decoder-only Transformer that autoregressively generates a point cloud for 3D Gaussian Splatting. This sequential approach allows for precise, adaptive construction, dynamically adjusting point density and the total number of points based on the subject's complexity. During point generation, the AR model also jointly predicts per-point binding information, enabling realistic animation. After generation, a dedicated Gaussian decoder converts the points into complete, renderable Gaussian attributes. We demonstrate that conditioning the decoder on the latent features from the AR generator enables effective interaction between stages and markedly improves fidelity. Extensive experiments validate that AvatarPointillist produces high-quality, photorealistic, and controllable avatars. We believe this autoregressive formulation represents a new paradigm for avatar generation, and we will release our code inspire future research.