GS-Playground: A High-Throughput Photorealistic Simulator for Vision-Informed Robot Learning

📅 2026-04-28

📈 Citations: 0

✨ Influential: 0

career value

223K/year

🤖 AI Summary

This work addresses key bottlenecks in large-scale vision-driven embodied intelligence—namely, the high computational cost of high-fidelity simulation, the labor-intensive creation of 3D assets, and the Sim2Real gap in both perception and physics. The authors propose a multimodal simulation framework that integrates a parallelized physics engine with batched 3D Gaussian splatting rendering, achieving, for the first time, photorealistic, high-resolution simulation at tens of thousands of frames per second. Complementing this, an automated Real2Sim reconstruction pipeline generates physically consistent and memory-efficient environments. This framework substantially improves the transfer performance of visual reinforcement learning policies across locomotion, navigation, and dexterous manipulation tasks, significantly lowering the barrier to training large-scale vision-based RL systems.

📝 Abstract

Embodied AI research is undergoing a shift toward vision-centric perceptual paradigms. While massively parallel simulators have catalyzed breakthroughs in proprioception-based locomotion, their potential remains largely untapped for vision-informed tasks due to the prohibitive computational overhead of large-scale photorealistic rendering. Furthermore, the creation of simulation-ready 3D assets heavily relies on labor-intensive manual modeling, while the significant sim-to-real physical gap hinders the transfer of contact-rich manipulation policies. To address these bottlenecks, we propose GS-Playground, a multi-modal simulation framework designed to accelerate end-to-end perceptual learning. We develop a novel high-performance parallel physics engine, specifically designed to integrate with a batch 3D Gaussian Splatting (3DGS) rendering pipeline to ensure high-fidelity synchronization. Our system achieves a breakthrough throughput of 10^4 FPS at 640x480 resolution, significantly lowering the barrier for large-scale visual RL. Additionally, we introduce an automated Real2Sim workflow that reconstructs photorealistic, physically consistent, and memory-efficient environments, streamlining the generation of complex simulation-ready scenes. Extensive experiments on locomotion, navigation, and manipulation demonstrate that GS-Playground effectively bridges the perceptual and physical gaps across diverse embodied tasks. Project homepage: https://gsplayground.github.io.

Problem

Research questions and friction points this paper is trying to address.

photorealistic rendering

vision-informed robot learning

sim-to-real gap

simulation-ready 3D assets

embodied AI

Innovation

Methods, ideas, or system contributions that make the work stand out.

3D Gaussian Splatting

high-throughput simulation

vision-informed robot learning