GhostShell: Streaming LLM Function Calls for Concurrent Embodied Programming

📅 2025-08-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the lack of real-time responsiveness, concurrency support, and dynamic adaptability in LLM-driven behavioral programming for embedded systems. We propose a streaming, concurrent embodied programming framework. Methodologically, we design a streaming XML function-tag parser, a dynamic interface mapper, and a multi-channel scheduler to enable incremental function invocation from LLM outputs and coordinated serial/parallel behavioral execution. By integrating streaming inference, multimodal input processing, and embodied control architecture, the framework achieves real-time command parsing and synchronous/asynchronous multi-channel execution. Unlike conventional behavior trees or pre-planned paradigms, ours is the first to natively support dynamic, concurrent, and incremental behavior generation directly from LLMs. Evaluated on 34 real-world tasks, it achieves 0.85 behavioral accuracy (Claude-4 Sonnet), delivers 66× faster end-to-end latency versus native API invocation, and significantly improves robustness in long-horizon tasks and cross-task generalization.

Technology Category

Application Category

📝 Abstract
We present GhostShell, a novel approach that leverages Large Language Models (LLMs) to enable streaming and concurrent behavioral programming for embodied systems. In contrast to conventional methods that rely on pre-scheduled action sequences or behavior trees, GhostShell drives embodied systems to act on-the-fly by issuing function calls incrementally as tokens are streamed from the LLM. GhostShell features a streaming XML function token parser, a dynamic function interface mapper, and a multi-channel scheduler that orchestrates intra-channel synchronous and inter-channel asynchronous function calls, thereby coordinating serial-parallel embodied actions across multiple robotic components as directed by the LLM. We evaluate GhostShell on our robot prototype COCO through comprehensive grounded experiments across 34 real-world interaction tasks and multiple LLMs. The results demonstrate that our approach achieves state-of-the-art Behavioral Correctness Metric of 0.85 with Claude-4 Sonnet and up to 66X faster response times compared to LLM native function calling APIs. GhostShell also proves effective in long-horizon multimodal tasks, demonstrating strong robustness and generalization.
Problem

Research questions and friction points this paper is trying to address.

Enables streaming concurrent behavioral programming for embodied systems
Replaces pre-scheduled actions with dynamic LLM-driven function calls
Coordinates serial-parallel actions across multiple robotic components
Innovation

Methods, ideas, or system contributions that make the work stand out.

Streaming XML function token parser
Dynamic function interface mapper
Multi-channel scheduler for serial-parallel actions
🔎 Similar Papers
No similar papers found.
J
Jian Gong
Leapwatt Robotics
Youwei Huang
Youwei Huang
Research Engineer at Institute of Intelligent Computing Technology, Suzhou, CAS
RoboticsWeb3BlockchainSoftware EngineeringDeep Learning
Bo Yuan
Bo Yuan
PhD Student in Machine Learning, Georgia Institute of Technology
Markov chain Monte CarloLarge Language Model
M
Ming Zhu
Leapwatt Robotics
J
Juncheng Zhan
Leapwatt Robotics
J
Jinke Wang
Leapwatt Robotics
H
Hang Shu
Leapwatt Robotics
M
Mingyue Xiong
Leapwatt Robotics
Y
Yanjun Ye
Leapwatt Robotics
Y
Yufan Zu
Leapwatt Robotics
Y
Yang Zhou
Leapwatt Robotics
Y
Yihan Ding
Leapwatt Robotics
X
Xuannian Chen
Leapwatt Robotics
X
Xingyu Lu
Leapwatt Robotics
R
Runjie Ban
Leapwatt Robotics
B
Bingchao Huang
Leapwatt Robotics
F
Fusen Liu
Leapwatt Robotics