Merging and Disentangling Views in Visual Reinforcement Learning for Robotic Manipulation

📅 2025-05-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In multi-view visual reinforcement learning, a fundamental trade-off exists among low sample efficiency, high deployment overhead, and weak policy robustness. To address this, we propose the Merge And Disentanglement (MAD) framework: it enhances representation capacity via multi-view feature fusion while explicitly disentangling view-specific features to enable lightweight single-view deployment. MAD further integrates disentangled representation learning with visual-servoing-inspired Q-learning optimization, jointly improving sample efficiency and policy generalization. Experiments on Meta-World and ManiSkill3 demonstrate that MAD achieves substantial gains—average task success rate improvement of +12.7% and up to 1.8× higher sample efficiency over prior state-of-the-art methods—while exhibiting strong robustness and generalization across unseen scenarios and tasks.

Technology Category

Application Category

📝 Abstract
Vision is well-known for its use in manipulation, especially using visual servoing. To make it robust, multiple cameras are needed to expand the field of view. That is computationally challenging. Merging multiple views and using Q-learning allows the design of more effective representations and optimization of sample efficiency. Such a solution might be expensive to deploy. To mitigate this, we introduce a Merge And Disentanglement (MAD) algorithm that efficiently merges views to increase sample efficiency while augmenting with single-view features to allow lightweight deployment and ensure robust policies. We demonstrate the efficiency and robustness of our approach using Meta-World and ManiSkill3. For project website and code, see https://aalmuzairee.github.io/mad
Problem

Research questions and friction points this paper is trying to address.

Merge multiple camera views for robust manipulation
Optimize sample efficiency in visual reinforcement learning
Enable lightweight deployment with single-view features
Innovation

Methods, ideas, or system contributions that make the work stand out.

Merge multiple views for robust vision
Use Q-learning for effective representations
Introduce MAD algorithm for lightweight deployment
🔎 Similar Papers
No similar papers found.
A
Abdulaziz Almuzairee
University of California San Diego
R
Rohan Patil
University of California San Diego
Dwait Bhatt
Dwait Bhatt
Robotics Graduate Student, UC San Diego
Reinforcement LearningRoboticsMachine LearningOn-Device AI
H
Henrik I. Christensen
University of California San Diego