OmniDP: Beyond-FOV Large-Workspace Humanoid Manipulation with Omnidirectional 3D Perception

📅 2026-03-05

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

This work addresses the limitations of humanoid robots operating in unstructured environments, where narrow perceptual fields hinder large-scale dexterous manipulation and conventional RGB-D systems suffer from self-occlusion, necessitating frequent base repositioning that compromises safety and stability. To overcome these challenges, the authors propose OmniDP—an end-to-end LiDAR-driven visuomotor policy that integrates 360° omnidirectional point cloud perception with a temporally aware attention-based pooling mechanism, thereby transcending the egocentric constraints of conventional depth sensing. Leveraging a whole-body teleoperation system for efficient collection of coordinated demonstration data, OmniDP achieves robust manipulation across an extended workspace without requiring additional mechanical components or external cameras. Experimental results demonstrate that the proposed method significantly outperforms monocular depth camera baselines in both simulated and real-world cluttered scenarios.

Technology Category

Application Category

📝 Abstract

The deployment of humanoid robots for dexterous manipulation in unstructured environments remains challenging due to perceptual limitations that constrain the effective workspace. In scenarios where physical constraints prevent the robot from repositioning itself, maintaining omnidirectional awareness becomes far more critical than color or semantic information.While recent advances in visuomotor policy learning have improved manipulation capabilities, conventional RGB-D solutions suffer from narrow fields of view (FOV) and self-occlusion, requiring frequent base movements that introduce motion uncertainty and safety risks. Existing approaches to expanding perception, including active vision systems and third-view cameras, introduce mechanical complexity, calibration dependencies, and latency that hinder reliable real-time performance. In this work, We propose OmniDP, an end-to-end LiDAR-driven 3D visuomotor policy that enables robust manipulation in large workspaces. Our method processes panoramic point clouds through a Time-Aware Attention Pooling mechanism, efficiently encoding sparse 3D data while capturing temporal dependencies. This 360{\deg} perception allows the robot to interact with objects across wide areas without frequent repositioning. To support policy learning, we develop a whole-body teleoperation system for efficient data collection on full-body coordination. Extensive experiments in simulation and real-world environments show that OmniDP achieves robust performance in large-workspace and cluttered scenarios, outperforming baselines that rely on egocentric depth cameras.

Problem

Research questions and friction points this paper is trying to address.

humanoid manipulation

large workspace

omnidirectional perception

field of view (FOV)

3D perception

Innovation

Methods, ideas, or system contributions that make the work stand out.

OmniDP

LiDAR-driven visuomotor policy

omnidirectional 3D perception