StreamME: Simplify 3D Gaussian Avatar within Live Stream

📅 2025-07-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Real-time 3D head reconstruction from video streams faces challenges in VR and online conferencing—including high latency, excessive bandwidth consumption, facial privacy risks, and difficulty integrating with downstream applications. Method: We propose a lightweight, “train-as-you-go” framework that abandons conventional MLPs in favor of principal-point geometric modeling and 3D Gaussian point-cloud representation. It leverages only sparse geometric cues for rapid, expression-adaptive reconstruction—enabling end-to-end streaming training and inference without pre-caching. Contribution/Results: Our method achieves the first millisecond-level synchronized avatar reconstruction, balancing high-fidelity rendering with ultra-low latency. It reduces communication bandwidth by over 90% compared to NeRF-based approaches and inherently obfuscates texture details to enhance facial privacy. Evaluated on real-world VR interaction and video conferencing systems, it supports plug-and-play integration with downstream pipelines.

Technology Category

Application Category

📝 Abstract
We propose StreamME, a method focuses on fast 3D avatar reconstruction. The StreamME synchronously records and reconstructs a head avatar from live video streams without any pre-cached data, enabling seamless integration of the reconstructed appearance into downstream applications. This exceptionally fast training strategy, which we refer to as on-the-fly training, is central to our approach. Our method is built upon 3D Gaussian Splatting (3DGS), eliminating the reliance on MLPs in deformable 3DGS and relying solely on geometry, which significantly improves the adaptation speed to facial expression. To further ensure high efficiency in on-the-fly training, we introduced a simplification strategy based on primary points, which distributes the point clouds more sparsely across the facial surface, optimizing points number while maintaining rendering quality. Leveraging the on-the-fly training capabilities, our method protects the facial privacy and reduces communication bandwidth in VR system or online conference. Additionally, it can be directly applied to downstream application such as animation, toonify, and relighting. Please refer to our project page for more details: https://songluchuan.github.io/StreamME/.
Problem

Research questions and friction points this paper is trying to address.

Fast 3D avatar reconstruction from live streams
Eliminate MLP reliance in deformable 3D Gaussian Splatting
Optimize point distribution for efficient on-the-fly training
Innovation

Methods, ideas, or system contributions that make the work stand out.

On-the-fly training for fast avatar reconstruction
3D Gaussian Splatting without MLP reliance
Simplified point cloud strategy for efficiency
🔎 Similar Papers
No similar papers found.