🤖 AI Summary
This work addresses the high latency and computational cost incurred by embodied AI agents due to frequent large language model (LLM) invocations at each decision step. To mitigate this, the authors propose a cache-driven asynchronous planning mechanism that exploits the strong locality inherent in task planning. The approach caches frequently occurring planning transitions during execution and asynchronously validates and updates these cached entries via background LLM calls. This is the first application of such a caching strategy in embodied AI, substantially reducing real-time reliance on LLMs. Experimental results across four multi-agent embodied benchmarks demonstrate that the method improves average task success rate by 22%, reduces simulation latency by 65%, and cuts token consumption by 50%.
📝 Abstract
Embodied AI agents increasingly rely on large language models (LLMs) for planning, yet per-step LLM calls impose severe latency and cost. In this paper, we show that embodied tasks exhibit strong plan locality, where the next plan is largely predictable from the current one. Building on this, we introduce AgenticCache, a planning framework that reuses cached plans to avoid per-step LLM calls. In AgenticCache, each agent queries a runtime cache of frequent plan transitions, while a background Cache Updater asynchronously calls the LLM to validate and refine cached entries. Across four multi-agent embodied benchmarks, AgenticCache improves task success rate by 22% on average across 12 configurations (4 benchmarks x 3 models), reduces simulation latency by 65%, and lowers token usage by 50%. Cache-based plan reuse thus offers a practical path to low-latency, low-cost embodied agents. Code is available at https://github.com/hojoonleokim/MLSys26_AgenticCache.