🤖 AI Summary
Existing video-driven human avatar methods struggle to simultaneously achieve high-fidelity rendering and robust cross-pose animation generalization: they either neglect animation capability or rely on dense pose supervision, leading to severe visual degradation under sparse poses or complex clothing. This paper proposes a “Record-Retrieve-Reconstruct” framework featuring a novel temporally decoupled dynamic codebook mechanism that explicitly models appearance variation as a timestamp-dependent function, disentangling temporal and pose dimensions. Integrating NeRF-based rendering, implicit surface modeling, and pose-similarity-guided appearance retrieval, our method enables high-resolution novel-view synthesis and adaptive appearance reconstruction under unseen poses. Evaluated on extreme sparse-pose and complex-clothing scenarios, our approach surpasses state-of-the-art methods, significantly mitigating visual artifacts while delivering both photorealistic rendering quality and strong animation generalization capability.
📝 Abstract
We present R3-Avatar, incorporating a temporal codebook, to overcome the inability of human avatars to be both animatable and of high-fidelity rendering quality. Existing video-based reconstruction of 3D human avatars either focuses solely on rendering, lacking animation support, or learns a pose-appearance mapping for animating, which degrades under limited training poses or complex clothing. In this paper, we adopt a"record-retrieve-reconstruct"strategy that ensures high-quality rendering from novel views while mitigating degradation in novel poses. Specifically, disambiguating timestamps record temporal appearance variations in a codebook, ensuring high-fidelity novel-view rendering, while novel poses retrieve corresponding timestamps by matching the most similar training poses for augmented appearance. Our R3-Avatar outperforms cutting-edge video-based human avatar reconstruction, particularly in overcoming visual quality degradation in extreme scenarios with limited training human poses and complex clothing.