🤖 AI Summary
This work addresses the challenge of integrating high-level semantic reasoning with safe and compliant physical interaction in dual-arm mobile manipulation. To this end, the authors propose a hierarchical cyber-physical framework that, for the first time, combines vision-language models with retrieval-augmented generation (VLM-RAG) alongside whole-body model predictive control (MPC) and unified impedance–admittance control. The approach leverages an experience vector database to map semantic context—without requiring retraining—to dynamic velocity limits, safety boundaries, and virtual impedance parameters, thereby endowing the robot with context-aware compliant manipulation capabilities. Experiments conducted in MuJoCo, IsaacSim, and on a physical dual-arm platform demonstrate that the system reduces robot velocity by 60% when approaching humans, significantly enhancing safety and social acceptability in human–robot interaction.
📝 Abstract
Bimanual mobile manipulation requires a seamless integration between high-level semantic reasoning and safe, compliant physical interaction - a challenge that end-to-end models approach opaquely and classical controllers lack the context to address. This paper presents GenerativeMPC, a hierarchical cyber-physical framework that explicitly bridges semantic scene understanding with physical control parameters for bimanual mobile manipulators. The system utilizes a Vision-Language Model with Retrieval-Augmented Generation (VLM-RAG) to translate visual and linguistic context into grounded control constraints, specifically outputting dynamic velocity limits and safety margins for a Whole-Body Model Predictive Controller (MPC). Simultaneously, the VLM-RAG module modulates virtual stiffness and damping gains for a unified impedance-admittance controller, enabling context-aware compliance during human-robot interaction. Our framework leverages an experience-driven vector database to ensure consistent parameter grounding without retraining. Experimental results in MuJoCo, IsaacSim, and on a physical bimanual platform confirm a 60% speed reduction near humans and safe, socially-aware navigation and manipulation through semantic-to-physical parameter grounding. This work advances the field of human-centric cybernetics by grounding large-scale cognitive models into predictable, high-frequency physical control loops.