🤖 AI Summary
This work addresses the visual domain gap between simulation and reality by proposing a unified, domain-agnostic point cloud representation framework that does not require explicit visual or object alignment. The approach integrates semantic features extracted from a vision-language model with a Transformer-based policy network, enabling robot policies to be trained exclusively on synthetic data while remaining effective in real-world settings. It further supports joint training with a small number of real-world demonstrations. In both single-task and multi-task scenarios, the method achieves up to a 44% improvement in zero-shot transfer success rate over existing approaches, and when augmented with limited real data, performance gains increase further to 66%, substantially outperforming current state-of-the-art methods.
📝 Abstract
Robot foundation models are beginning to deliver on the promise of generalist robotic agents, yet progress remains constrained by the scarcity of large-scale real-world manipulation datasets. Simulation and synthetic data generation offer a scalable alternative, but their usefulness is limited by the visual domain gap between simulation and reality. In this work, we present Point Bridge, a framework that leverages unified, domain-agnostic point-based representations to unlock synthetic datasets for zero-shot sim-to-real policy transfer, without explicit visual or object-level alignment. Point Bridge combines automated point-based representation extraction via Vision-Language Models (VLMs), transformer-based policy learning, and efficient inference-time pipelines to train capable real-world manipulation agents using only synthetic data. With additional co-training on small sets of real demonstrations, Point Bridge further improves performance, substantially outperforming prior vision-based sim-and-real co-training methods. It achieves up to 44% gains in zero-shot sim-to-real transfer and up to 66% with limited real data across both single-task and multitask settings. Videos of the robot are best viewed at: https://pointbridge3d.github.io/