OmniDP: Beyond-FOV Large-Workspace Humanoid Manipulation with Omnidirectional 3D Perception

πŸ“… 2026-03-05
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the limitations of humanoid robots operating in unstructured environments, where narrow perceptual fields hinder large-scale dexterous manipulation and conventional RGB-D systems suffer from self-occlusion, necessitating frequent base repositioning that compromises safety and stability. To overcome these challenges, the authors propose OmniDPβ€”an end-to-end LiDAR-driven visuomotor policy that integrates 360Β° omnidirectional point cloud perception with a temporally aware attention-based pooling mechanism, thereby transcending the egocentric constraints of conventional depth sensing. Leveraging a whole-body teleoperation system for efficient collection of coordinated demonstration data, OmniDP achieves robust manipulation across an extended workspace without requiring additional mechanical components or external cameras. Experimental results demonstrate that the proposed method significantly outperforms monocular depth camera baselines in both simulated and real-world cluttered scenarios.

Technology Category

Application Category

πŸ“ Abstract
The deployment of humanoid robots for dexterous manipulation in unstructured environments remains challenging due to perceptual limitations that constrain the effective workspace. In scenarios where physical constraints prevent the robot from repositioning itself, maintaining omnidirectional awareness becomes far more critical than color or semantic information.While recent advances in visuomotor policy learning have improved manipulation capabilities, conventional RGB-D solutions suffer from narrow fields of view (FOV) and self-occlusion, requiring frequent base movements that introduce motion uncertainty and safety risks. Existing approaches to expanding perception, including active vision systems and third-view cameras, introduce mechanical complexity, calibration dependencies, and latency that hinder reliable real-time performance. In this work, We propose OmniDP, an end-to-end LiDAR-driven 3D visuomotor policy that enables robust manipulation in large workspaces. Our method processes panoramic point clouds through a Time-Aware Attention Pooling mechanism, efficiently encoding sparse 3D data while capturing temporal dependencies. This 360{\deg} perception allows the robot to interact with objects across wide areas without frequent repositioning. To support policy learning, we develop a whole-body teleoperation system for efficient data collection on full-body coordination. Extensive experiments in simulation and real-world environments show that OmniDP achieves robust performance in large-workspace and cluttered scenarios, outperforming baselines that rely on egocentric depth cameras.
Problem

Research questions and friction points this paper is trying to address.

humanoid manipulation
large workspace
omnidirectional perception
field of view (FOV)
3D perception
Innovation

Methods, ideas, or system contributions that make the work stand out.

OmniDP
LiDAR-driven visuomotor policy
omnidirectional 3D perception
Time-Aware Attention Pooling
whole-body teleoperation
πŸ”Ž Similar Papers
No similar papers found.
P
Pei Qu
The Hong Kong University of Science and Technology (Guangzhou)
Z
Zheng Li
The Hong Kong University of Science and Technology (Guangzhou)
Y
Yufei Jia
Tsinghua University
Z
Ziyun Liu
The Hong Kong University of Science and Technology (Guangzhou)
L
Liang Zhu
The Hong Kong University of Science and Technology (Guangzhou)
Haoang Li
Haoang Li
Assistant Professor, Hong Kong University of Science and Technology (Guangzhou)
Robotics3D Computer Vision
Jinni Zhou
Jinni Zhou
HKUST(GZ), HKUST
Jun Ma
Jun Ma
Assistant Professor, The Hong Kong University of Science and Technology
RoboticsAutonomous DrivingMotion Planning and ControlOptimization