ESCA: Enabling Seamless Codec Avatar Execution through Algorithm and Hardware Co-Optimization for Virtual Reality

📅 2025-10-26

📈 Citations: 0

✨ Influential: 0

career value

251K/year

🤖 AI Summary

To address the challenge of real-time inference of Photorealistic Codec Avatars (PCAs) on resource-constrained VR devices, this paper proposes ESCA, an algorithm–hardware co-optimization framework. Methodologically, ESCA introduces a dedicated post-training low-bit quantization scheme tailored for PCA models, integrated with a customized hardware accelerator and perceptually guided quality evaluation using FovVideoVDP. In terms of contributions and results, ESCA achieves the first full-stack optimization of PCAs—balancing high fidelity and high efficiency. Compared to the state-of-the-art 4-bit baseline, it improves FovVideoVDP by 0.39 and reduces inference latency by up to 3.36×. End-to-end measurements demonstrate sustained throughput of 100 fps, satisfying the stringent low-latency and high-frame-rate requirements for immersive VR interaction.

Technology Category

Application Category

📝 Abstract

Photorealistic Codec Avatars (PCA), which generate high-fidelity human face renderings, are increasingly being used in Virtual Reality (VR) environments to enable immersive communication and interaction through deep learning-based generative models. However, these models impose significant computational demands, making real-time inference challenging on resource-constrained VR devices such as head-mounted displays, where latency and power efficiency are critical. To address this challenge, we propose an efficient post-training quantization (PTQ) method tailored for Codec Avatar models, enabling low-precision execution without compromising output quality. In addition, we design a custom hardware accelerator that can be integrated into the system-on-chip of VR devices to further enhance processing efficiency. Building on these components, we introduce ESCA, a full-stack optimization framework that accelerates PCA inference on edge VR platforms. Experimental results demonstrate that ESCA boosts FovVideoVDP quality scores by up to $+0.39$ over the best 4-bit baseline, delivers up to $3.36 imes$ latency reduction, and sustains a rendering rate of 100 frames per second in end-to-end tests, satisfying real-time VR requirements. These results demonstrate the feasibility of deploying high-fidelity codec avatars on resource-constrained devices, opening the door to more immersive and portable VR experiences.

Problem

Research questions and friction points this paper is trying to address.

Reducing computational demands of photorealistic avatars on VR devices

Enabling real-time inference for codec avatars with limited resources

Optimizing hardware and algorithms for efficient VR avatar execution

Innovation

Methods, ideas, or system contributions that make the work stand out.

Post-training quantization for low-precision avatar execution

Custom hardware accelerator for VR system-on-chip integration

Full-stack optimization framework for real-time edge inference

🔎 Similar Papers

Instant-3D: Instant Neural Radiance Field Training Towards On-Device AR/VR 3D Reconstruction