MobilePortrait: Real-Time One-Shot Neural Head Avatars on Mobile Devices

📅 2024-07-08

🏛️ arXiv.org

📈 Citations: 6

✨ Influential: 0

career value

240K/year

🤖 AI Summary

Existing neural head-avatar methods achieve promising image quality and motion fidelity but suffer from high computational overhead, hindering real-time deployment on mobile devices. To address this, we propose the first single-shot, real-time neural portrait generation framework tailored for mobile platforms. Our approach introduces a novel hybrid explicit-implicit keypoint motion representation, coupled with a precomputed visual feature synthesis mechanism to drastically reduce modeling complexity. We further incorporate external-knowledge-guided lightweight motion modeling and streamline the U-Net backbone architecture. Experiments demonstrate that our method achieves over 100 FPS inference speed on smartphones, with computational cost less than one-tenth of state-of-the-art methods. Moreover, it supports dual driving modalities—video and audio—enabling versatile real-time applications. This work establishes an efficient, practical paradigm for on-device neural portrait synthesis.

Technology Category

Application Category

📝 Abstract

Existing neural head avatars methods have achieved significant progress in the image quality and motion range of portrait animation. However, these methods neglect the computational overhead, and to the best of our knowledge, none is designed to run on mobile devices. This paper presents MobilePortrait, a lightweight one-shot neural head avatars method that reduces learning complexity by integrating external knowledge into both the motion modeling and image synthesis, enabling real-time inference on mobile devices. Specifically, we introduce a mixed representation of explicit and implicit keypoints for precise motion modeling and precomputed visual features for enhanced foreground and background synthesis. With these two key designs and using simple U-Nets as backbones, our method achieves state-of-the-art performance with less than one-tenth the computational demand. It has been validated to reach speeds of over 100 FPS on mobile devices and support both video and audio-driven inputs.

Problem

Research questions and friction points this paper is trying to address.

Enable real-time neural head avatars on mobile devices

Reduce computational overhead for mobile compatibility

Improve motion modeling and image synthesis efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight one-shot neural head avatars

Mixed explicit-implicit keypoints for motion

Precomputed features for efficient synthesis

🔎 Similar Papers

No similar papers found.