🤖 AI Summary
This work addresses the challenge of achieving real-time, efficient collision detection for robotic motion planning on general-purpose hardware, where existing solutions either suffer from limited performance or lack algorithmic adaptability when implemented on specialized accelerators. To bridge this gap, the authors propose RoboCore, a novel architecture that introduces robot-specific control flow mechanisms into GPU ray tracing acceleration units (RTAs). By co-designing customized control logic with the RTA while preserving compatibility with the CUDA ecosystem, RoboCore effectively supports both classical and emerging motion planning algorithms. Experimental results demonstrate that RoboCore achieves a 3.1× speedup over native RTA and a 14.8× improvement over a CUDA baseline for collision detection. Furthermore, it delivers 3.6× and 1.1× acceleration in neural motion planning and Monte Carlo localization tasks, respectively, matching the performance of dedicated accelerators while retaining high flexibility.
📝 Abstract
Autonomous robots are increasingly prevalent in our society, emerging in medical care, transportation vehicles, and home assistance. These robots rely on motion planning and collision detection to identify a sequence of movements allowing them to navigate to an end goal without colliding with the surrounding environment. While many specialized accelerators have been proposed to meet the real-time requirements of robotics planning tasks, they often lack the flexibility to adapt to the rapidly changing landscape of robotics and support future advancements. However, GPUs are well-positioned for robotics and we find that they can also tackle collision detection algorithms with enhancements to existing ray tracing accelerator (RTA) units. Unlike intersection tests in ray tracing, collision queries in robotics require control flow mechanisms to avoid unnecessary computations in each query. In this work, we explore and compare different architectural modifications to address the gaps of existing GPU RTAs. Our proposed RoboGPU architecture introduces a RoboCore that computes collision queries 3.1$\times$ faster than RTA implementations and 14.8$\times$ faster than a CUDA baseline. RoboCore is also useful for other robotics tasks, achieving 3.6$\times$ speedup on a state-of-the-art neural motion planner and 1.1$\times$ speedup on Monte Carlo Localization compared to a baseline GPU. RoboGPU matches the performance of dedicated hardware accelerators while being able to adapt to evolving motion planning algorithms and support classical algorithms.