Boosting Embodied AI Agents through Perception-Generation Disaggregation and Asynchronous Pipeline Execution

📅 2025-09-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional sequential reasoning architectures struggle to meet the real-time requirements of embodied AI for high-frequency perception and action generation in dynamic environments. To address this, we propose Auras—a co-designed asynchronous inference framework integrating algorithmic and systems innovations. Methodologically, Auras (1) decouples perception and generation modules and implements a controlled pipelined parallel execution mechanism; and (2) introduces a shared-context synchronization strategy to mitigate data staleness under high concurrency while preserving decision accuracy. At the systems level, it incorporates lightweight scheduling and memory optimization to enable end-to-end efficient asynchronous inference. Experimental results demonstrate that Auras achieves a 2.54× average throughput improvement over baseline serial architectures while maintaining 102.7% of the original model’s accuracy—effectively breaking the performance bottleneck inherent in sequential designs.

Technology Category

Application Category

📝 Abstract
Embodied AI systems operate in dynamic environments, requiring seamless integration of perception and generation modules to process high-frequency input and output demands. Traditional sequential computation patterns, while effective in ensuring accuracy, face significant limitations in achieving the necessary "thinking" frequency for real-world applications. In this work, we present Auras, an algorithm-system co-designed inference framework to optimize the inference frequency of embodied AI agents. Auras disaggregates the perception and generation and provides controlled pipeline parallelism for them to achieve high and stable throughput. Faced with the data staleness problem that appears when the parallelism is increased, Auras establishes a public context for perception and generation to share, thereby promising the accuracy of embodied agents. Experimental results show that Auras improves throughput by 2.54x on average while achieving 102.7% of the original accuracy, demonstrating its efficacy in overcoming the constraints of sequential computation and providing high throughput.
Problem

Research questions and friction points this paper is trying to address.

Overcoming sequential computation limitations in Embodied AI
Enhancing perception-generation throughput for real-time applications
Addressing data staleness in parallel AI inference pipelines
Innovation

Methods, ideas, or system contributions that make the work stand out.

Disaggregates perception and generation modules
Uses asynchronous pipeline parallelism execution
Establishes shared public context for accuracy
🔎 Similar Papers
No similar papers found.