Mon3tr: Monocular 3D Telepresence with Pre-built Gaussian Avatars as Amortization

πŸ“… 2026-01-12
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work proposes Mon3tr, a novel framework for mobile immersive telepresence that overcomes the limitations of existing systems, which typically rely on multi-camera setups and high-bandwidth volumetric data transmission, hindering real-time performance on mobile devices. Mon3tr introduces the first use of 3D Gaussian Splatting (3DGS) for parametric human modeling, combined with an amortized inference strategy: a user-specific 3DGS avatar is constructed offline, and during online operation, only a monocular RGB video stream is required to drive real-time pose and expression synthesis. This approach drastically reduces hardware and bandwidth demands, achieving approximately 60 FPS rendering on devices such as the Meta Quest 3, with an end-to-end latency of about 80 ms and bandwidth consumption below 0.2 Mbpsβ€”over 1000Γ— lower than point cloud streaming. The method attains a PSNR exceeding 28 dB under novel poses, enabling high-quality, low-overhead mobile 3D telepresence.

Technology Category

Application Category

πŸ“ Abstract
Immersive telepresence aims to transform human interaction in AR/VR applications by enabling lifelike full-body holographic representations for enhanced remote collaboration. However, existing systems rely on hardware-intensive multi-camera setups and demand high bandwidth for volumetric streaming, limiting their real-time performance on mobile devices. To overcome these challenges, we propose Mon3tr, a novel Monocular 3D telepresence framework that integrates 3D Gaussian splatting (3DGS) based parametric human modeling into telepresence for the first time. Mon3tr adopts an amortized computation strategy, dividing the process into a one-time offline multi-view reconstruction phase to build a user-specific avatar and a monocular online inference phase during live telepresence sessions. A single monocular RGB camera is used to capture body motions and facial expressions in real time to drive the 3DGS-based parametric human model, significantly reducing system complexity and cost. The extracted motion and appearance features are transmitted at<0.2 Mbps over WebRTC's data channel, allowing robust adaptation to network fluctuations. On the receiver side, e.g., Meta Quest 3, we develop a lightweight 3DGS attribute deformation network to dynamically generate corrective 3DGS attribute adjustments on the pre-built avatar, synthesizing photorealistic motion and appearance at ~ 60 FPS. Extensive experiments demonstrate the state-of-the-art performance of our method, achieving a PSNR of>28 dB for novel poses, an end-to-end latency of ~ 80 ms, and>1000x bandwidth reduction compared to point-cloud streaming, while supporting real-time operation from monocular inputs across diverse scenarios. Our demos can be found at https://mon3tr3d.github.io.
Problem

Research questions and friction points this paper is trying to address.

telepresence
monocular
3D reconstruction
bandwidth efficiency
real-time performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Monocular 3D Telepresence
3D Gaussian Splatting
Amortized Computation
Parametric Human Modeling
Bandwidth-Efficient Streaming
πŸ”Ž Similar Papers
No similar papers found.
F
Fangyu Lin
Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong
Yingdong Hu
Yingdong Hu
Institute for Interdisciplinary Information Sciences, Tsinghua University
computer visionrobotics
Z
Zhening Liu
Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong
Y
Yufan Zhuang
Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong
Zehong Lin
Zehong Lin
Research Assistant Professor, Hong Kong University of Science and Technology
Edge AIMachine Learning
Jun Zhang
Jun Zhang
Professor, Hong Kong University of Science and Technology, IEEE Fellow
Mobile Edge ComputingEdge AIWireless CommunicationsGenAI