OneOcc: Semantic Occupancy Prediction for Legged Robots with a Single Panoramic Camera

📅 2025-11-05

📈 Citations: 0

✨ Influential: 0

career value

228K/year

🤖 AI Summary

This work addresses the challenge of robust monocular panoramic 3D semantic occupancy prediction for legged/bipedal robots under gait-induced vibrations and full 360° continuous scenes. Methodologically, we propose the first end-to-end solution relying solely on a ring-shaped panoramic camera, featuring dual-projection fusion (ring-unfolding + equirectangular), Cartesian-cylindrical dual-grid voxelization, a lightweight hierarchical AMoE-3D decoder, and a novel feature-level gait displacement compensation mechanism enabling self-supervised motion correction. Our contributions are threefold: (1) the first method to model motion distortion without IMU or LiDAR; (2) state-of-the-art performance on our newly established QuadOcc and H3O benchmarks—outperforming leading vision- and LiDAR-based approaches, with an 8.08 mIoU gain in cross-city scenarios; and (3) a lightweight, real-time deployable architecture supporting omnidirectional semantic occupancy prediction.

Technology Category

Application Category

📝 Abstract

Robust 3D semantic occupancy is crucial for legged/humanoid robots, yet most semantic scene completion (SSC) systems target wheeled platforms with forward-facing sensors. We present OneOcc, a vision-only panoramic SSC framework designed for gait-introduced body jitter and 360{deg} continuity. OneOcc combines: (i) Dual-Projection fusion (DP-ER) to exploit the annular panorama and its equirectangular unfolding, preserving 360{deg} continuity and grid alignment; (ii) Bi-Grid Voxelization (BGV) to reason in Cartesian and cylindrical-polar spaces, reducing discretization bias and sharpening free/occupied boundaries; (iii) a lightweight decoder with Hierarchical AMoE-3D for dynamic multi-scale fusion and better long-range/occlusion reasoning; and (iv) plug-and-play Gait Displacement Compensation (GDC) learning feature-level motion correction without extra sensors. We also release two panoramic occupancy benchmarks: QuadOcc (real quadruped, first-person 360{deg}) and Human360Occ (H3O) (CARLA human-ego 360{deg} with RGB, Depth, semantic occupancy; standardized within-/cross-city splits). OneOcc sets new state-of-the-art (SOTA): on QuadOcc it beats strong vision baselines and popular LiDAR ones; on H3O it gains +3.83 mIoU (within-city) and +8.08 (cross-city). Modules are lightweight, enabling deployable full-surround perception for legged/humanoid robots. Datasets and code will be publicly available at https://github.com/MasterHow/OneOcc.

Problem

Research questions and friction points this paper is trying to address.

Semantic occupancy prediction for legged robots using single panoramic camera

Addressing gait-induced body jitter and 360° continuity challenges

Overcoming limitations of forward-facing sensors in wheeled platforms

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses dual-projection fusion for 360-degree panoramic continuity

Implements bi-grid voxelization to reduce discretization bias

Employs gait displacement compensation for motion correction

🔎 Similar Papers

Autonomous Exploration and Semantic Updating of Large-Scale Indoor Environments with Mobile Robots