RobotPan: A 360$^\circ$ Surround-View Robotic Vision System for Embodied Perception

📅 2026-04-15

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

Existing robotic vision systems are often constrained by narrow fields of view or require manual switching among multiple cameras, leading to operational interruptions and motion-induced disorientation that hinder effective human-robot collaboration. To address this, this work proposes RobotPan, a feedforward framework that integrates a six-camera rig with LiDAR to construct a 360-degree panoramic visual system. It unifies multi-view features in spherical coordinates and introduces a hierarchical spherical voxel prior to decode a compact 3D Gaussian representation that preserves high resolution in near regions while suppressing redundancy in distant areas. Coupled with an online dynamic fusion mechanism, the system enables real-time scene updates and controlled growth of static regions. The method substantially reduces Gaussian counts, achieving efficient streaming without compromising reconstruction fidelity or novel view synthesis quality, and introduces the first multi-sensor dataset tailored for metric 3D 360-degree reconstruction in robotics.

Technology Category

Application Category

📝 Abstract

Surround-view perception is increasingly important for robotic navigation and loco-manipulation, especially in human-in-the-loop settings such as teleoperation, data collection, and emergency takeover. However, current robotic visual interfaces are often limited to narrow forward-facing views, or, when multiple on-board cameras are available, require cumbersome manual switching that interrupts the operator's workflow. Both configurations suffer from motion-induced jitter that causes simulator sickness in head-mounted displays. We introduce a surround-view robotic vision system that combines six cameras with LiDAR to provide full 360$^\circ$ visual coverage, while meeting the geometric and real-time constraints of embodied deployment. We further present \textsc{RobotPan}, a feed-forward framework that predicts \emph{metric-scaled} and \emph{compact} 3D Gaussians from calibrated sparse-view inputs for real-time rendering, reconstruction, and streaming. \textsc{RobotPan} lifts multi-view features into a unified spherical coordinate representation and decodes Gaussians using hierarchical spherical voxel priors, allocating fine resolution near the robot and coarser resolution at larger radii to reduce computational redundancy without sacrificing fidelity. To support long sequences, our online fusion updates dynamic content while preventing unbounded growth in static regions by selectively updating appearance. Finally, we release a multi-sensor dataset tailored to 360$^\circ$ novel view synthesis and metric 3D reconstruction for robotics, covering navigation, manipulation, and locomotion on real platforms. Experiments show that \textsc{RobotPan} achieves competitive quality against prior feed-forward reconstruction and view-synthesis methods while producing substantially fewer Gaussians, enabling practical real-time embodied deployment. Project website: https://robotpan.github.io/

Problem

Research questions and friction points this paper is trying to address.

surround-view perception

robotic vision

simulator sickness

embodied perception

teleoperation

Innovation

Methods, ideas, or system contributions that make the work stand out.

360-degree surround-view

3D Gaussian splatting

spherical coordinate representation