🤖 AI Summary
Existing digital AI service systems struggle to support the asynchronous pattern required by physical AI—alternating multi-step reasoning and action execution—and fail to meet the low-latency demands of large-scale robot fleets. This work proposes Kairos, the first physical AI service system designed for multi-robot environments, which treats the generate-and-execute loop as a first-class primitive and deeply co-schedules reasoning and actions. Its core innovations include asynchronous interleaved scheduling, chunked generation of multi-turn reasoning and actions, and a scalable cluster-serving architecture. Experiments across diverse physical AI models and robotic platforms demonstrate that Kairos reduces end-to-end task latency by 31.8%–66.5% compared to state-of-the-art digital AI serving approaches, with performance gains amplifying as robot fleet size increases.
📝 Abstract
Physical AI is experiencing rapid growth with frontier foundation models increasing its capabilities across general environments. Physical AI tasks are characterized by inference properties that are markedly different from digital AI. They consist of multiple rounds of inference and action execution, generating a chunk of actions in each inference round, and asynchronously interleaving inference and execution. This makes existing digital AI serving systems unsuited for physical AI; a shortcoming that is critical for enabling their wide adoption, considering their size and the scale of the robot fleets they have to serve. To fill this gap, we design Kairos, the first multi-robot serving system that makes the generate-execute loop a first-class citizen, with active involvement in the execution phase. Across a wide range of physical AI models and robots, Kairos reduces the average end-to-end task latency by 31.8--66.5% over state-of-the-art digital AI serving practices, with gains scaling with the robot fleet size.