Observer Actor: Active Vision Imitation Learning with Sparse View Gaussian Splatting

📅 2025-11-22

📈 Citations: 0

✨ Influential: 0

career value

237K/year

🤖 AI Summary

In dual-arm manipulation, occlusions severely degrade visual observation quality and hinder policy generalization. Method: We propose the Observer-Actor (ObAct) framework—the first to dynamically assign observer and actor roles. The observer arm constructs a real-time 3D scene model via sparse-view Gaussian Splatting and actively optimizes virtual camera poses to discover unoccluded, optimal viewpoints. The actor arm leverages this high-fidelity observation for precise manipulation. This co-adaptive design aligns training distributions more closely with real-world test conditions, effectively mitigating occlusion. Results: Under trajectory transfer and behavioral cloning paradigms, ObAct achieves 145%/233% and 75%/143% absolute success rate improvements over fixed-camera baselines in occlusion-free/occluded settings, respectively. These results demonstrate the effectiveness and state-of-the-art robustness of active vision-based imitation learning for dual-arm manipulation.

Technology Category

Application Category

📝 Abstract

We propose Observer Actor (ObAct), a novel framework for active vision imitation learning in which the observer moves to optimal visual observations for the actor. We study ObAct on a dual-arm robotic system equipped with wrist-mounted cameras. At test time, ObAct dynamically assigns observer and actor roles: the observer arm constructs a 3D Gaussian Splatting (3DGS) representation from three images, virtually explores this to find an optimal camera pose, then moves to this pose; the actor arm then executes a policy using the observer's observations. This formulation enhances the clarity and visibility of both the object and the gripper in the policy's observations. As a result, we enable the training of ambidextrous policies on observations that remain closer to the occlusion-free training distribution, leading to more robust policies. We study this formulation with two existing imitation learning methods -- trajectory transfer and behavior cloning -- and experiments show that ObAct significantly outperforms static-camera setups: trajectory transfer improves by 145% without occlusion and 233% with occlusion, while behavior cloning improves by 75% and 143%, respectively. Videos are available at https://obact.github.io.

Problem

Research questions and friction points this paper is trying to address.

Active vision imitation learning with dynamic observer-actor role assignment

Improving policy robustness by optimizing camera viewpoints for occlusion-free observations

Enhancing dual-arm robotic manipulation through sparse view 3D Gaussian Splatting representation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Active vision imitation learning with dynamic role assignment

3D Gaussian Splatting for optimal viewpoint exploration

Ambidextrous policies trained on occlusion-free observations

🔎 Similar Papers

No similar papers found.