🤖 AI Summary
This work addresses the lack of real-time responsiveness, concurrency support, and dynamic adaptability in LLM-driven behavioral programming for embedded systems. We propose a streaming, concurrent embodied programming framework. Methodologically, we design a streaming XML function-tag parser, a dynamic interface mapper, and a multi-channel scheduler to enable incremental function invocation from LLM outputs and coordinated serial/parallel behavioral execution. By integrating streaming inference, multimodal input processing, and embodied control architecture, the framework achieves real-time command parsing and synchronous/asynchronous multi-channel execution. Unlike conventional behavior trees or pre-planned paradigms, ours is the first to natively support dynamic, concurrent, and incremental behavior generation directly from LLMs. Evaluated on 34 real-world tasks, it achieves 0.85 behavioral accuracy (Claude-4 Sonnet), delivers 66× faster end-to-end latency versus native API invocation, and significantly improves robustness in long-horizon tasks and cross-task generalization.
📝 Abstract
We present GhostShell, a novel approach that leverages Large Language Models (LLMs) to enable streaming and concurrent behavioral programming for embodied systems. In contrast to conventional methods that rely on pre-scheduled action sequences or behavior trees, GhostShell drives embodied systems to act on-the-fly by issuing function calls incrementally as tokens are streamed from the LLM. GhostShell features a streaming XML function token parser, a dynamic function interface mapper, and a multi-channel scheduler that orchestrates intra-channel synchronous and inter-channel asynchronous function calls, thereby coordinating serial-parallel embodied actions across multiple robotic components as directed by the LLM. We evaluate GhostShell on our robot prototype COCO through comprehensive grounded experiments across 34 real-world interaction tasks and multiple LLMs. The results demonstrate that our approach achieves state-of-the-art Behavioral Correctness Metric of 0.85 with Claude-4 Sonnet and up to 66X faster response times compared to LLM native function calling APIs. GhostShell also proves effective in long-horizon multimodal tasks, demonstrating strong robustness and generalization.