Vision in Action: Learning Active Perception from Human Demonstrations

📅 2025-06-18

📈 Citations: 0

✨ Influential: 0

career value

223K/year

🤖 AI Summary

Visual occlusion in dual-arm robotic manipulation degrades perception reliability. Method: This paper proposes a task-driven active vision framework based on learning from human demonstrations. Demonstrations are collected via VR teleoperation; a 6-DoF biomimetic robotic neck and a shared 3D scene represented by Neural Radiance Fields (NeRF) enable low-latency, motion-sickness-resilient visual feedback. We introduce a novel VR-robot co-rendering and perception update mechanism with decoupled rendering and sensing pipelines, and present the first end-to-end learned robust active vision policy for multi-stage bimanual tasks—covering search, tracking, and focusing. Results: Our approach significantly outperforms baselines across three challenging occlusion-prone manipulation tasks. The learned policy exhibits strong generalization to unseen objects and configurations, maintains stable hardware deployment, and effectively mitigates VR-induced motion sickness.

Technology Category

Application Category

📝 Abstract

We present Vision in Action (ViA), an active perception system for bimanual robot manipulation. ViA learns task-relevant active perceptual strategies (e.g., searching, tracking, and focusing) directly from human demonstrations. On the hardware side, ViA employs a simple yet effective 6-DoF robotic neck to enable flexible, human-like head movements. To capture human active perception strategies, we design a VR-based teleoperation interface that creates a shared observation space between the robot and the human operator. To mitigate VR motion sickness caused by latency in the robot's physical movements, the interface uses an intermediate 3D scene representation, enabling real-time view rendering on the operator side while asynchronously updating the scene with the robot's latest observations. Together, these design elements enable the learning of robust visuomotor policies for three complex, multi-stage bimanual manipulation tasks involving visual occlusions, significantly outperforming baseline systems.

Problem

Research questions and friction points this paper is trying to address.

Learning active perception from human demonstrations

Developing a 6-DoF robotic neck for flexible movements

Mitigating VR motion sickness with 3D scene representation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Learns active perception from human demonstrations

Uses 6-DoF robotic neck for human-like movements

VR teleoperation with real-time 3D scene rendering

🔎 Similar Papers

No similar papers found.