🤖 AI Summary
Scalability of contact-point combination optimization in multi-robot object pushing tasks suffers from combinatorial explosion as robot count and object size increase.
Method: This paper proposes ConPoSe—the first approach to leverage large language models (LLMs) for contact-point selection, exploiting their commonsense reasoning to guide search in high-dimensional combinatorial spaces, while integrating local search to ensure physical feasibility and convergence efficiency.
Contribution/Results: ConPoSe overcomes computational bottlenecks of traditional analytical methods in large-scale scenarios. It is validated on cubic, cylindrical, and T-shaped objects. Experiments demonstrate significant improvements over both pure analytical baselines and LLM-only approaches in both contact-point quality and computational scalability, establishing a new paradigm for physics-aware, LLM-guided robotic coordination.
📝 Abstract
Object transportation in cluttered environments is a fundamental task in various domains, including domestic service and warehouse logistics. In cooperative object transport, multiple robots must coordinate to move objects that are too large for a single robot. One transport strategy is pushing, which only requires simple robots. However, careful selection of robot-object contact points is necessary to push the object along a preplanned path. Although this selection can be solved analytically, the solution space grows combinatorially with the number of robots and object size, limiting scalability. Inspired by how humans rely on common-sense reasoning for cooperative transport, we propose combining the reasoning capabilities of Large Language Models with local search to select suitable contact points. Our LLM-guided local search method for contact point selection, ConPoSe, successfully selects contact points for a variety of shapes, including cuboids, cylinders, and T-shapes. We demonstrate that ConPoSe scales better with the number of robots and object size than the analytical approach, and also outperforms pure LLM-based selection.