Real-Time Human Frontal View Synthesis from a Single Image

📅 2026-03-16

📈 Citations: 0

✨ Influential: 0

career value

233K/year

🤖 AI Summary

This work addresses the challenges of single-image front-facing portrait synthesis, which are often hindered by insufficient geometric understanding, distortions in facial and hand details, and the difficulty of achieving real-time inference. To overcome these limitations, we propose the PrismMirror framework, which integrates cascaded coarse-to-fine geometry learning based on SMPL-X meshes and point clouds, rendering-supervised texture refinement, and knowledge distillation into a lightweight linear attention model—all without relying on external geometric priors. Our method achieves photorealistic reconstruction with high efficiency, marking the first approach to enable real-time performance (24 FPS) for monocular front-view portrait synthesis while significantly outperforming existing methods in both visual fidelity and structural accuracy.

Technology Category

Application Category

📝 Abstract

Photorealistic human novel view synthesis from a single image is crucial for democratizing immersive 3D telepresence, eliminating the need for complex multi-camera setups. However, current rendering-centric methods prioritize visual fidelity over explicit geometric understanding and struggle with intricate regions like faces and hands, leading to temporal instability. Meanwhile, human-centric frameworks suffer from memory bottlenecks since they typically rely on an auxiliary model to provide informative structural priors for geometric modeling, which limits real-time performance. To address these challenges, we propose PrismMirror, a geometry-guided framework for instant frontal view synthesis from a single image. By avoiding external geometric modeling and focusing on frontal view synthesis, our model optimizes visual integrity for telepresence. Specifically, PrismMirror introduces a novel cascade learning strategy that enables coarse-to-fine geometric feature learning. It first directly learns coarse geometric features, such as SMPL-X meshes and point clouds, and then refines textures through rendering supervision. To achieve real-time efficiency, we distill this unified framework into a lightweight linear attention model. Notably, PrismMirror is the first monocular human frontal view synthesis model that achieves real-time inference at 24 FPS, significantly outperforming previous methods in both visual authenticity and structural accuracy.

Problem

Research questions and friction points this paper is trying to address.

real-time

human frontal view synthesis

single image

photorealistic

3D telepresence

Innovation

Methods, ideas, or system contributions that make the work stand out.

real-time synthesis

geometry-guided learning

monocular human view synthesis