Latent Dynamics for Full Body Avatar Animation

📅 2026-05-20
📈 Citations: 0
Influential: 0
📄 PDF

career value

206K/year
🤖 AI Summary
Existing pose-driven full-body avatars struggle to model dynamic details such as loose clothing, as their deformations depend on historical states, inertia, and contact—factors not fully captured by the current pose alone. This work proposes a dynamic avatar framework based on 3D Gaussian splatting, incorporating a Transformer decoder and dynamic residual latent variables. By learning a temporal evolution model of these latent variables, the method predicts future states from a short history of poses. It introduces a novel dynamics decomposition mechanism that decouples latent updates into driving, restorative, and dissipative forces, enabling controllable (e.g., stiffness-adjustable) yet diverse and physically plausible motion trajectories. Without requiring explicit garment templates or physics-based simulation, the approach outperforms existing data-driven methods on nine motion sequences featuring loose clothing, demonstrating superior temporal coherence, detail fidelity, and rendering clarity in both quantitative metrics and user studies.
📝 Abstract
Pose-driven full-body avatars built on neural rendering produce high-quality novel views of a captured subject. Yet loose clothing and other dynamic elements deform in ways pose alone cannot explain: the same pose can correspond to many different states, because their motion depends on history, inertia, and contact. Explicit simulation and layered-garment methods can model such dynamics, but they require either a dedicated garment template, which raw multi-view capture does not naturally provide, or a test-time physics simulator with non-trivial runtime cost. A parallel line of work learns data-driven clothing avatars that avoid explicit garment layers. These methods add an auxiliary latent for variation beyond pose; at inference, they fix it, regress it from pose, or retrieve it from training data, without explicitly modeling how the latent evolves with its own dynamics. Additionally, even in everyday motion with loose clothing, existing architectures often struggle to capture fine-grained detail, producing blurry renderings and temporal artifacts. We augment a pose-conditioned 3D Gaussian avatar with a transformer-based decoder and a dynamics residual latent that captures temporal appearance and geometry variation beyond the driving signals. At inference, a learned latent dynamics model evolves the residual latent from a short pose history and the previous latent state. The model decomposes each update into driving, restoring, and dissipative forces, producing temporally coherent, history-dependent rollouts with negligible added cost. Different initial conditions yield diverse yet plausible motion trajectories, and the force decomposition exposes controls such as stiffness. Across nine captured sequences of everyday motion with diverse loose garments, quantitative metrics and a perceptual user study show improved animation quality over recent data-driven baselines.
Problem

Research questions and friction points this paper is trying to address.

latent dynamics
full-body avatar
loose clothing
temporal coherence
neural rendering
Innovation

Methods, ideas, or system contributions that make the work stand out.

latent dynamics
full-body avatars
neural rendering
temporal coherence
transformer-based decoder
🔎 Similar Papers
No similar papers found.