🤖 AI Summary
To address the challenges of heterogeneous robot teams—namely, poor responsiveness to free-form natural language instructions, degradation in long-horizon collaborative execution, and hallucination in planning—this paper proposes a hybrid decentralized scheduling framework. Our method uniquely integrates a task dependency graph generated by a large language model (LLM) with a capability-aware robot-task assignment matrix into a mixed-integer linear programming (MILP) solver to compute time-optimal schedules minimizing makespan. At the lower level, agent-based closed-loop control enables decentralized autonomous execution. Evaluated on multiple natural language–driven collaborative benchmarks, our approach achieves significantly higher task success rates for two-robot heterogeneous teams compared to state-of-the-art generative planning methods. Hardware experiments further demonstrate real-time deployment feasibility on quadruped platforms. The core contribution is a rigorous闭环 linking LLM-based semantic understanding with formal, verifiable scheduling—thereby reconciling expressive natural language specification with executable, certifiable control.
📝 Abstract
Coordinating heterogeneous robot teams from free-form natural-language instructions is hard. Language-only planners struggle with long-horizon coordination and hallucination, while purely formal methods require closed-world models. We present FLEET, a hybrid decentralized framework that turns language into optimized multi-robot schedules. An LLM front-end produces (i) a task graph with durations and precedence and (ii) a capability-aware robot--task fitness matrix; a formal back-end solves a makespan-minimization problem while the underlying robots execute their free-form subtasks with agentic closed-loop control. Across multiple free-form language-guided autonomy coordination benchmarks, FLEET improves success over state of the art generative planners on two-agent teams across heterogeneous tasks. Ablations show that mixed integer linear programming (MILP) primarily improves temporal structure, while LLM-derived fitness is decisive for capability-coupled tasks; together they deliver the highest overall performance. We demonstrate the translation to real world challenges with hardware trials using a pair of quadruped robots with disjoint capabilities.