๐ค AI Summary
Existing interactive 3D scene datasets rely on labor-intensive manual annotation of part segmentation, kinematic types, and motion trajectories, limiting scale and incurring high costs. This paper introduces the first end-to-end zero-shot learning framework that, without any training data, jointly performs movable part detection and segmentation, joint type classification and motion parameter estimation, and implicit geometric completion directly from static 3D scenesโyielding physically plausible, interactive dynamic simulation environments. Our method supports standard-format export (e.g., USD, GLTF) and seamless cross-platform simulation integration. Evaluated on diverse indoor scenes, it achieves state-of-the-art performance across all core tasks: movable part detection, part segmentation, and joint parameter estimation. By eliminating annotation dependence and enabling fully automated, scalable construction of interactive 3D scenes, this work significantly advances the automation and scalability of dynamic scene generation, providing foundational support for embodied AI and immersive virtual interaction.
๐ Abstract
Interactive 3D scenes are increasingly vital for embodied intelligence, yet existing datasets remain limited due to the labor-intensive process of annotating part segmentation, kinematic types, and motion trajectories. We present REACT3D, a scalable zero-shot framework that converts static 3D scenes into simulation-ready interactive replicas with consistent geometry, enabling direct use in diverse downstream tasks. Our contributions include: (i) openable-object detection and segmentation to extract candidate movable parts from static scenes, (ii) articulation estimation that infers joint types and motion parameters, (iii) hidden-geometry completion followed by interactive object assembly, and (iv) interactive scene integration in widely supported formats to ensure compatibility with standard simulation platforms. We achieve state-of-the-art performance on detection/segmentation and articulation metrics across diverse indoor scenes, demonstrating the effectiveness of our framework and providing a practical foundation for scalable interactive scene generation, thereby lowering the barrier to large-scale research on articulated scene understanding. Our project page is extit{hypersetup{urlcolor=black}href{https://react3d.github.io/}{react3d.github.io}}.