MUA: Mobile Ultra-detailed Animatable Avatars

📅 2026-04-20

📈 Citations: 0

✨ Influential: 0

career value

229K/year

🤖 AI Summary

Existing animatable digital humans struggle to simultaneously achieve high-fidelity dynamic appearance and efficient deployment on mobile devices. This work proposes a wavelet-guided, multi-level spatially factorized blendshape representation accompanied by a tailored knowledge distillation pipeline that transfers motion-aware garment dynamics and fine-grained appearance from a high-fidelity teacher model to a lightweight student model. The approach enables, for the first time, real-time ultra-high-definition digital human rendering on standalone VR headsets such as the Meta Quest 3 with visual quality approaching that of server-grade methods. The resulting model is ten times smaller and reduces computational cost by a factor of 2,000, achieving 180 FPS on desktop and 24 FPS on the Quest 3—significantly outperforming existing mobile solutions while matching the visual fidelity of high-end approaches.

Technology Category

Application Category

📝 Abstract

Building photorealistic, animatable full-body digital humans remains a longstanding challenge in computer graphics and vision. Recent advances in animatable avatar modeling have largely progressed along two directions: improving the fidelity of dynamic geometry and appearance, or reducing computational complexity to enable deployment on resource-constrained platforms, e.g., VR headsets. However, existing approaches fail to achieve both goals simultaneously: Ultra-high-fidelity avatars typically require substantial computation on server-class GPUs, whereas lightweight avatars often suffer from limited surface dynamics, reduced appearance details, and noticeable artifacts. To bridge this gap, we propose a novel animatable avatar representation, termed Wavelet-guided Multi-level Spatial Factorized Blendshapes, and a corresponding distillation pipeline that transfers motion-aware clothing dynamics and fine-grained appearance details from a pre-trained ultra-high-quality avatar model into a compact, efficient representation. By coupling multi-level wavelet spectral decomposition with low-rank structural factorization in texture space, our method achieves up to 2000X lower computational cost and a 10X smaller model size than the original high-quality teacher avatar model, while preserving visually plausible dynamics and appearance details closely resemble those of the teacher model. Extensive comparisons with state-of-the-art methods show that our approach significantly outperforms existing avatar approaches designed for mobile settings and achieves comparable or superior rendering quality to most approaches that can only run on servers. Importantly, our representation substantially improves the practicality of high-fidelity avatars for immersive applications, achieving over 180 FPS on a desktop PC and real-time native on-device performance at 24 FPS on a standalone Meta Quest 3.

Problem

Research questions and friction points this paper is trying to address.

animatable avatars

photorealistic rendering

computational efficiency

mobile deployment

appearance fidelity

Innovation

Methods, ideas, or system contributions that make the work stand out.

animatable avatars

wavelet decomposition

model distillation