🤖 AI Summary
To address robustness and efficiency challenges in non-grasping manipulation involving multiple objects, heterogeneous physical properties, and complex contact interactions, this paper proposes C3+, an end-to-end vision-driven planar pushing method based on Contact-Implicit Model Predictive Control (CI-MPC). The approach integrates 3D mesh reconstruction, real-time visual scanning, and optimization solver acceleration. It is the first to extend CI-MPC to multi-object unocclusion and environmental tidying tasks, overcoming prior scalability and complexity limitations. Experiments across 33 objects with diverse geometries and physical properties demonstrate that C3+ achieves 98% success rate for single-object tasks and 92% for four-object tasks, with average execution times of only 0.5 minutes and 5.3 minutes, respectively. Pose estimation accuracy significantly surpasses baseline methods, confirming substantial improvements in both reliability and computational efficiency for vision-guided contact-rich manipulation.
📝 Abstract
Non-prehensile manipulation of diverse objects remains a core challenge in robotics, driven by unknown physical properties and the complexity of contact-rich interactions. Recent advances in contact-implicit model predictive control (CI-MPC), with contact reasoning embedded directly in the trajectory optimization, have shown promise in tackling the task efficiently and robustly, yet demonstrations have been limited to narrowly curated examples. In this work, we showcase the broader capabilities of CI-MPC through precise planar pushing tasks over a wide range of object geometries, including multi-object domains. These scenarios demand reasoning over numerous inter-object and object-environment contacts to strategically manipulate and de-clutter the environment, challenges that were intractable for prior CI-MPC methods. To achieve this, we introduce Consensus Complementarity Control Plus (C3+), an enhanced CI-MPC algorithm integrated into a complete pipeline spanning object scanning, mesh reconstruction, and hardware execution. Compared to its predecessor C3, C3+ achieves substantially faster solve times, enabling real-time performance even in multi-object pushing tasks. On hardware, our system achieves overall 98% success rate across 33 objects, reaching pose goals within tight tolerances. The average time-to-goal is approximately 0.5, 1.6, 3.2, and 5.3 minutes for 1-, 2-, 3-, and 4-object tasks, respectively. Project page: https://dairlab.github.io/push-anything.