R3-Avatar: Record and Retrieve Temporal Codebook for Reconstructing Photorealistic Human Avatars

📅 2025-03-17

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

Existing video-driven human avatar methods struggle to simultaneously achieve high-fidelity rendering and robust cross-pose animation generalization: they either neglect animation capability or rely on dense pose supervision, leading to severe visual degradation under sparse poses or complex clothing. This paper proposes a “Record-Retrieve-Reconstruct” framework featuring a novel temporally decoupled dynamic codebook mechanism that explicitly models appearance variation as a timestamp-dependent function, disentangling temporal and pose dimensions. Integrating NeRF-based rendering, implicit surface modeling, and pose-similarity-guided appearance retrieval, our method enables high-resolution novel-view synthesis and adaptive appearance reconstruction under unseen poses. Evaluated on extreme sparse-pose and complex-clothing scenarios, our approach surpasses state-of-the-art methods, significantly mitigating visual artifacts while delivering both photorealistic rendering quality and strong animation generalization capability.

Technology Category

Application Category

📝 Abstract

We present R3-Avatar, incorporating a temporal codebook, to overcome the inability of human avatars to be both animatable and of high-fidelity rendering quality. Existing video-based reconstruction of 3D human avatars either focuses solely on rendering, lacking animation support, or learns a pose-appearance mapping for animating, which degrades under limited training poses or complex clothing. In this paper, we adopt a"record-retrieve-reconstruct"strategy that ensures high-quality rendering from novel views while mitigating degradation in novel poses. Specifically, disambiguating timestamps record temporal appearance variations in a codebook, ensuring high-fidelity novel-view rendering, while novel poses retrieve corresponding timestamps by matching the most similar training poses for augmented appearance. Our R3-Avatar outperforms cutting-edge video-based human avatar reconstruction, particularly in overcoming visual quality degradation in extreme scenarios with limited training human poses and complex clothing.

Problem

Research questions and friction points this paper is trying to address.

Overcome inability of avatars to be animatable and high-fidelity

Mitigate degradation in novel poses and complex clothing

Ensure high-quality rendering from novel views

Innovation

Methods, ideas, or system contributions that make the work stand out.

Temporal codebook records appearance variations

Retrieve timestamps for novel pose rendering

High-fidelity rendering with limited training poses

🔎 Similar Papers

No similar papers found.