GhostShell: Streaming LLM Function Calls for Concurrent Embodied Programming

📅 2025-08-07

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

This work addresses the lack of real-time responsiveness, concurrency support, and dynamic adaptability in LLM-driven behavioral programming for embedded systems. We propose a streaming, concurrent embodied programming framework. Methodologically, we design a streaming XML function-tag parser, a dynamic interface mapper, and a multi-channel scheduler to enable incremental function invocation from LLM outputs and coordinated serial/parallel behavioral execution. By integrating streaming inference, multimodal input processing, and embodied control architecture, the framework achieves real-time command parsing and synchronous/asynchronous multi-channel execution. Unlike conventional behavior trees or pre-planned paradigms, ours is the first to natively support dynamic, concurrent, and incremental behavior generation directly from LLMs. Evaluated on 34 real-world tasks, it achieves 0.85 behavioral accuracy (Claude-4 Sonnet), delivers 66× faster end-to-end latency versus native API invocation, and significantly improves robustness in long-horizon tasks and cross-task generalization.

Technology Category

Application Category

📝 Abstract

We present GhostShell, a novel approach that leverages Large Language Models (LLMs) to enable streaming and concurrent behavioral programming for embodied systems. In contrast to conventional methods that rely on pre-scheduled action sequences or behavior trees, GhostShell drives embodied systems to act on-the-fly by issuing function calls incrementally as tokens are streamed from the LLM. GhostShell features a streaming XML function token parser, a dynamic function interface mapper, and a multi-channel scheduler that orchestrates intra-channel synchronous and inter-channel asynchronous function calls, thereby coordinating serial-parallel embodied actions across multiple robotic components as directed by the LLM. We evaluate GhostShell on our robot prototype COCO through comprehensive grounded experiments across 34 real-world interaction tasks and multiple LLMs. The results demonstrate that our approach achieves state-of-the-art Behavioral Correctness Metric of 0.85 with Claude-4 Sonnet and up to 66X faster response times compared to LLM native function calling APIs. GhostShell also proves effective in long-horizon multimodal tasks, demonstrating strong robustness and generalization.

Problem

Research questions and friction points this paper is trying to address.

Enables streaming concurrent behavioral programming for embodied systems

Replaces pre-scheduled actions with dynamic LLM-driven function calls

Coordinates serial-parallel actions across multiple robotic components

Innovation

Methods, ideas, or system contributions that make the work stand out.

Streaming XML function token parser

Dynamic function interface mapper

Multi-channel scheduler for serial-parallel actions

🔎 Similar Papers

ToolGen: Unified Tool Retrieval and Calling via Generation