Latent Dynamics for Full Body Avatar Animation

📅 2026-05-20

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

Existing pose-driven full-body avatars struggle to model dynamic details such as loose clothing, as their deformations depend on historical states, inertia, and contact—factors not fully captured by the current pose alone. This work proposes a dynamic avatar framework based on 3D Gaussian splatting, incorporating a Transformer decoder and dynamic residual latent variables. By learning a temporal evolution model of these latent variables, the method predicts future states from a short history of poses. It introduces a novel dynamics decomposition mechanism that decouples latent updates into driving, restorative, and dissipative forces, enabling controllable (e.g., stiffness-adjustable) yet diverse and physically plausible motion trajectories. Without requiring explicit garment templates or physics-based simulation, the approach outperforms existing data-driven methods on nine motion sequences featuring loose clothing, demonstrating superior temporal coherence, detail fidelity, and rendering clarity in both quantitative metrics and user studies.

📝 Abstract

Pose-driven full-body avatars built on neural rendering produce high-quality novel views of a captured subject. Yet loose clothing and other dynamic elements deform in ways pose alone cannot explain: the same pose can correspond to many different states, because their motion depends on history, inertia, and contact. Explicit simulation and layered-garment methods can model such dynamics, but they require either a dedicated garment template, which raw multi-view capture does not naturally provide, or a test-time physics simulator with non-trivial runtime cost. A parallel line of work learns data-driven clothing avatars that avoid explicit garment layers. These methods add an auxiliary latent for variation beyond pose; at inference, they fix it, regress it from pose, or retrieve it from training data, without explicitly modeling how the latent evolves with its own dynamics. Additionally, even in everyday motion with loose clothing, existing architectures often struggle to capture fine-grained detail, producing blurry renderings and temporal artifacts. We augment a pose-conditioned 3D Gaussian avatar with a transformer-based decoder and a dynamics residual latent that captures temporal appearance and geometry variation beyond the driving signals. At inference, a learned latent dynamics model evolves the residual latent from a short pose history and the previous latent state. The model decomposes each update into driving, restoring, and dissipative forces, producing temporally coherent, history-dependent rollouts with negligible added cost. Different initial conditions yield diverse yet plausible motion trajectories, and the force decomposition exposes controls such as stiffness. Across nine captured sequences of everyday motion with diverse loose garments, quantitative metrics and a perceptual user study show improved animation quality over recent data-driven baselines.

Problem

Research questions and friction points this paper is trying to address.

latent dynamics

full-body avatar

loose clothing

temporal coherence

neural rendering

Innovation

Methods, ideas, or system contributions that make the work stand out.

latent dynamics

full-body avatars

neural rendering