HoloScene: Simulation-Ready Interactive 3D Worlds from a Single Video

📅 2025-10-07

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

Existing 3D reconstruction methods suffer from insufficient joint modeling of geometric completeness, physical plausibility, object interactivity, and photorealistic rendering. This paper introduces a novel framework for reconstructing physically accurate and interactive 3D scenes from a single monocular video. We propose a unified interactive scene graph representation that jointly encodes geometry, appearance, and dynamics; develop an energy-based hybrid solver that tightly couples observational data, physical constraints, and generative priors—optimized via combined sampling-based exploration and gradient-based refinement; and achieve high-fidelity digital twin reconstruction. Our method significantly outperforms state-of-the-art approaches across multiple benchmarks, enabling high-quality novel-view synthesis, real-time dynamic simulation, and AR/robotic interaction. To our knowledge, it is the first method to simultaneously guarantee geometric accuracy, visual realism, and physical stability from monocular video input.

Technology Category

Application Category

📝 Abstract

Digitizing the physical world into accurate simulation-ready virtual environments offers significant opportunities in a variety of fields such as augmented and virtual reality, gaming, and robotics. However, current 3D reconstruction and scene-understanding methods commonly fall short in one or more critical aspects, such as geometry completeness, object interactivity, physical plausibility, photorealistic rendering, or realistic physical properties for reliable dynamic simulation. To address these limitations, we introduce HoloScene, a novel interactive 3D reconstruction framework that simultaneously achieves these requirements. HoloScene leverages a comprehensive interactive scene-graph representation, encoding object geometry, appearance, and physical properties alongside hierarchical and inter-object relationships. Reconstruction is formulated as an energy-based optimization problem, integrating observational data, physical constraints, and generative priors into a unified, coherent objective. Optimization is efficiently performed via a hybrid approach combining sampling-based exploration with gradient-based refinement. The resulting digital twins exhibit complete and precise geometry, physical stability, and realistic rendering from novel viewpoints. Evaluations conducted on multiple benchmark datasets demonstrate superior performance, while practical use-cases in interactive gaming and real-time digital-twin manipulation illustrate HoloScene's broad applicability and effectiveness. Project page: https://xiahongchi.github.io/HoloScene.

Problem

Research questions and friction points this paper is trying to address.

Creating simulation-ready 3D worlds from single videos

Overcoming limitations in geometry completeness and interactivity

Integrating physical properties and relationships for realistic simulations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Interactive scene-graph representation for object relationships

Energy-based optimization integrating data and physical constraints

Hybrid sampling and gradient-based refinement approach

🔎 Similar Papers

HO-Cap: A Capture System and Dataset for 3D Reconstruction and Pose Tracking of Hand-Object Interaction