🤖 AI Summary
Robust 6-DoF grasping of diverse objects in unstructured, cluttered environments remains challenged by inaccurate grasp pose prediction and dynamic pose changes during execution. This paper proposes a value-guided visual closed-loop grasping method that integrates a large-scale synthetic-data-trained value function into a Model Predictive Control (MPC) framework. The approach enables end-to-end real-time trajectory optimization and dynamic response by online fusing visual observations, collision avoidance, and motion smoothness constraints. Crucially, it generalizes to novel objects without real-world fine-tuning, significantly enhancing system stability and adaptability. In simulation and real-world noisy environments, the method achieves 32.6% and 33.3% absolute improvements in grasp success rate, respectively—substantially outperforming open-loop methods, diffusion-based policies, Transformer-based policies, and Implicit Q-Learning (IQL) baselines.
📝 Abstract
Grasping of diverse objects in unstructured environments remains a significant challenge. Open-loop grasping methods, effective in controlled settings, struggle in cluttered environments. Grasp prediction errors and object pose changes during grasping are the main causes of failure. In contrast, closed-loop methods address these challenges in simplified settings (e.g., single object on a table) on a limited set of objects, with no path to generalization. We propose Grasp-MPC, a closed-loop 6-DoF vision-based grasping policy designed for robust and reactive grasping of novel objects in cluttered environments. Grasp-MPC incorporates a value function, trained on visual observations from a large-scale synthetic dataset of 2 million grasp trajectories that include successful and failed attempts. We deploy this learned value function in an MPC framework in combination with other cost terms that encourage collision avoidance and smooth execution. We evaluate Grasp-MPC on FetchBench and real-world settings across diverse environments. Grasp-MPC improves grasp success rates by up to 32.6% in simulation and 33.3% in real-world noisy conditions, outperforming open-loop, diffusion policy, transformer policy, and IQL approaches. Videos and more at http://grasp-mpc.github.io.