Strategic Vantage Selection for Learning Viewpoint-Agnostic Manipulation Policies

📅 2025-06-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Vision-based manipulation policies exhibit poor out-of-distribution generalization to unseen camera viewpoints—especially critical in dynamic settings where camera pose must be adjusted online. To address this, we propose Vantage, the first framework that formulates viewpoint selection as a continuous optimization problem. Leveraging Bayesian search, Vantage jointly optimizes viewpoint sampling and policy learning to efficiently identify a minimal set of highly informative viewpoints within the continuous viewpoint space, eliminating redundant full-spectrum sampling. Our approach integrates multi-view representation learning, iterative policy fine-tuning, and an exploration-exploitation balancing mechanism. Evaluated on multiple standard robotic manipulation tasks, Vantage achieves an average performance gain of 46.19% over fixed, random, and heuristic viewpoint selection baselines. It yields more robust, lightweight, and viewpoint-agnostic manipulation policies.

Technology Category

Application Category

📝 Abstract
Vision-based manipulation has shown remarkable success, achieving promising performance across a range of tasks. However, these manipulation policies often fail to generalize beyond their training viewpoints, which is a persistent challenge in achieving perspective-agnostic manipulation, especially in settings where the camera is expected to move at runtime. Although collecting data from many angles seems a natural solution, such a naive approach is both resource-intensive and degrades manipulation policy performance due to excessive and unstructured visual diversity. This paper proposes Vantage, a framework that systematically identifies and integrates data from optimal perspectives to train robust, viewpoint-agnostic policies. By formulating viewpoint selection as a continuous optimization problem, we iteratively fine-tune policies on a few vantage points. Since we leverage Bayesian optimization to efficiently navigate the infinite space of potential camera configurations, we are able to balance exploration of novel views and exploitation of high-performing ones, thereby ensuring data collection from a minimal number of effective viewpoints. We empirically evaluate this framework on diverse standard manipulation tasks using multiple policy learning methods, demonstrating that fine-tuning with data from strategic camera placements yields substantial performance gains, achieving average improvements of up to 46.19% when compared to fixed, random, or heuristic-based strategies.
Problem

Research questions and friction points this paper is trying to address.

Generalizing vision-based manipulation policies across viewpoints
Reducing resource-intensive data collection from multiple angles
Optimizing camera viewpoints for robust policy training
Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimizes camera viewpoints via continuous Bayesian optimization
Trains robust policies with minimal strategic viewpoints
Balances exploration and exploitation for effective data