🤖 AI Summary
This work addresses the ill-posed inversion problem in animatable portrait editing under sparse supervision, which often leads to identity leakage and temporal flickering due to insufficient constraints. To mitigate these issues, the authors propose a constraint inversion framework grounded in information-theoretic regularization. The approach restricts edits to low-dimensional, part-specific subspaces within a structured latent space and derives an information matrix via local linearization of the decode-and-render pipeline. The spectral properties of this matrix are leveraged to predict edit stability, guiding frame reweighting and keyframe activation. By integrating efficient Hessian-vector product computation with conditional objective optimization, the method significantly enhances temporal consistency while effectively suppressing identity leakage and flickering artifacts, even under limited supervision.
📝 Abstract
Editing animatable human avatars typically relies on sparse supervision, often a few edited keyframes, yet naively fitting a reconstructed avatar to these edits frequently causes identity leakage and pose-dependent temporal flicker. We argue that these failures are best understood as an ill-conditioned inversion: the available edited constraints do not sufficiently determine the latent directions responsible for the intended edit. We propose a conditioning-guided edited reconstruction framework that performs editing as a constrained inversion in a structured avatar latent space, restricting updates to a low-dimensional, part-specific edit subspace to prevent unintended identity changes. Crucially, we design the editing constraints during inversion by optimizing a conditioning objective derived from a local linearization of the full decoding-and-rendering pipeline, yielding an edit-subspace information matrix whose spectrum predicts stability and drives frame reweighting / keyframe activation. The resulting method operates on small subspace matrices and can be implemented efficiently (e.g., via Hessian-vector products), and improves stability under limited edited supervision.