🤖 AI Summary
Existing digital twin approaches struggle to simultaneously achieve fast reconstruction, high visual fidelity, and planning-ready collision geometry, limiting their use in high-fidelity closed-loop robotic manipulation. This work proposes an efficient framework that constructs semantic-consistent, geometrically accurate, and interactive digital twins from sparse RGB images within minutes. By integrating visibility-aware semantic labeling with 3D Gaussian splatting, the method enables photorealistic reconstruction at high speed, while a novel filtering-based geometric conversion mechanism seamlessly generates collision models suitable for motion planning. The system is integrated into a Unity-ROS2-MoveIt physics simulation pipeline and validated on a Franka Emika Panda robot, demonstrating significantly improved robustness and success rates in real-world pick-and-place tasks.
📝 Abstract
Developing high-fidelity, interactive digital twins is crucial for enabling closed-loop motion planning and reliable real-world robot execution, which are essential to advancing sim-to-real transfer. However, existing approaches often suffer from slow reconstruction, limited visual fidelity, and difficulties in converting photorealistic models into planning-ready collision geometry. We present a practical framework that constructs high-quality digital twins within minutes from sparse RGB inputs. Our system employs 3D Gaussian Splatting (3DGS) for fast, photorealistic reconstruction as a unified scene representation. We enhance 3DGS with visibility-aware semantic fusion for accurate 3D labelling and introduce an efficient, filter-based geometry conversion method to produce collision-ready models seamlessly integrated with a Unity-ROS2-MoveIt physics engine. In experiments with a Franka Emika Panda robot performing pick-and-place tasks, we demonstrate that this enhanced geometric accuracy effectively supports robust manipulation in real-world trials. These results demonstrate that 3DGS-based digital twins, enriched with semantic and geometric consistency, offer a fast, reliable, and scalable path from perception to manipulation in unstructured environments.