RobotPan: A 360$^\circ$ Surround-View Robotic Vision System for Embodied Perception

📅 2026-04-15
📈 Citations: 0
Influential: 0
📄 PDF

career value

208K/year
🤖 AI Summary
Existing robotic vision systems are often constrained by narrow fields of view or require manual switching among multiple cameras, leading to operational interruptions and motion-induced disorientation that hinder effective human-robot collaboration. To address this, this work proposes RobotPan, a feedforward framework that integrates a six-camera rig with LiDAR to construct a 360-degree panoramic visual system. It unifies multi-view features in spherical coordinates and introduces a hierarchical spherical voxel prior to decode a compact 3D Gaussian representation that preserves high resolution in near regions while suppressing redundancy in distant areas. Coupled with an online dynamic fusion mechanism, the system enables real-time scene updates and controlled growth of static regions. The method substantially reduces Gaussian counts, achieving efficient streaming without compromising reconstruction fidelity or novel view synthesis quality, and introduces the first multi-sensor dataset tailored for metric 3D 360-degree reconstruction in robotics.

Technology Category

Application Category

📝 Abstract
Surround-view perception is increasingly important for robotic navigation and loco-manipulation, especially in human-in-the-loop settings such as teleoperation, data collection, and emergency takeover. However, current robotic visual interfaces are often limited to narrow forward-facing views, or, when multiple on-board cameras are available, require cumbersome manual switching that interrupts the operator's workflow. Both configurations suffer from motion-induced jitter that causes simulator sickness in head-mounted displays. We introduce a surround-view robotic vision system that combines six cameras with LiDAR to provide full 360$^\circ$ visual coverage, while meeting the geometric and real-time constraints of embodied deployment. We further present \textsc{RobotPan}, a feed-forward framework that predicts \emph{metric-scaled} and \emph{compact} 3D Gaussians from calibrated sparse-view inputs for real-time rendering, reconstruction, and streaming. \textsc{RobotPan} lifts multi-view features into a unified spherical coordinate representation and decodes Gaussians using hierarchical spherical voxel priors, allocating fine resolution near the robot and coarser resolution at larger radii to reduce computational redundancy without sacrificing fidelity. To support long sequences, our online fusion updates dynamic content while preventing unbounded growth in static regions by selectively updating appearance. Finally, we release a multi-sensor dataset tailored to 360$^\circ$ novel view synthesis and metric 3D reconstruction for robotics, covering navigation, manipulation, and locomotion on real platforms. Experiments show that \textsc{RobotPan} achieves competitive quality against prior feed-forward reconstruction and view-synthesis methods while producing substantially fewer Gaussians, enabling practical real-time embodied deployment. Project website: https://robotpan.github.io/
Problem

Research questions and friction points this paper is trying to address.

surround-view perception
robotic vision
simulator sickness
embodied perception
teleoperation
Innovation

Methods, ideas, or system contributions that make the work stand out.

360-degree surround-view
3D Gaussian splatting
spherical coordinate representation
real-time embodied perception
metric-scaled reconstruction
🔎 Similar Papers