Merging and Disentangling Views in Visual Reinforcement Learning for Robotic Manipulation

📅 2025-05-07

📈 Citations: 0

✨ Influential: 0

career value

244K/year

🤖 AI Summary

In multi-view visual reinforcement learning, a fundamental trade-off exists among low sample efficiency, high deployment overhead, and weak policy robustness. To address this, we propose the Merge And Disentanglement (MAD) framework: it enhances representation capacity via multi-view feature fusion while explicitly disentangling view-specific features to enable lightweight single-view deployment. MAD further integrates disentangled representation learning with visual-servoing-inspired Q-learning optimization, jointly improving sample efficiency and policy generalization. Experiments on Meta-World and ManiSkill3 demonstrate that MAD achieves substantial gains—average task success rate improvement of +12.7% and up to 1.8× higher sample efficiency over prior state-of-the-art methods—while exhibiting strong robustness and generalization across unseen scenarios and tasks.

Technology Category

Application Category

📝 Abstract

Vision is well-known for its use in manipulation, especially using visual servoing. To make it robust, multiple cameras are needed to expand the field of view. That is computationally challenging. Merging multiple views and using Q-learning allows the design of more effective representations and optimization of sample efficiency. Such a solution might be expensive to deploy. To mitigate this, we introduce a Merge And Disentanglement (MAD) algorithm that efficiently merges views to increase sample efficiency while augmenting with single-view features to allow lightweight deployment and ensure robust policies. We demonstrate the efficiency and robustness of our approach using Meta-World and ManiSkill3. For project website and code, see https://aalmuzairee.github.io/mad

Problem

Research questions and friction points this paper is trying to address.

Merge multiple camera views for robust manipulation

Optimize sample efficiency in visual reinforcement learning

Enable lightweight deployment with single-view features

Innovation

Methods, ideas, or system contributions that make the work stand out.

Merge multiple views for robust vision

Use Q-learning for effective representations

Introduce MAD algorithm for lightweight deployment

🔎 Similar Papers

Learning Manipulation Skills through Robot Chain-of-Thought with Sparse Failure Guidance