KEEP: A KV-Cache-Centric Memory Management System for Efficient Embodied Planning

๐Ÿ“… 2026-02-27
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the challenges of excessive prompt length and high prefill latency in existing text-based memory approaches for embodied planning, as well as the inefficiency of current KV cache reuse strategies due to frequent updates. To overcome these limitations, the authors propose a KV cacheโ€“centric memory management system that introduces three key innovations: a static-dynamic hybrid granularity memory construction, a multi-hop memory recomputation mechanism, and a hierarchical balanced loading strategy. These components collectively mitigate redundant recomputation and load imbalance in the cache. Experimental results on the ALFRED dataset demonstrate that the proposed method achieves a 2.68ร— speedup over text-based memory with negligible accuracy loss, and outperforms CacheBlend by a 4.13% improvement in task success rate while reducing first-token generation time by 1.90ร—.

Technology Category

Application Category

๐Ÿ“ Abstract
Memory-augmented Large Language Models (LLMs) have demonstrated remarkable capability for complex and long-horizon embodied planning. By keeping track of past experiences and environmental states, memory enables LLMs to maintain a global view, thereby avoiding repetitive exploration. However, existing approaches often store the memory as raw text, leading to excessively long prompts and high prefill latency. While it is possible to store and reuse the KV caches, the efficiency benefits are greatly undermined due to frequent KV cache updates. In this paper, we propose KEEP, a KV-cache-centric memory management system for efficient embodied planning. KEEP features 3 key innovations: (1) a Static-Dynamic Memory Construction algorithm that reduces KV cache recomputation by mixed-granularity memory group; (2) a Multi-hop Memory Re-computation algorithm that dynamically identifies important cross-attention among different memory groups and reconstructs memory interactions iteratively; (3) a Layer-balanced Memory Loading that eliminates unbalanced KV cache loading and cross-attention computation across different layers. Extensive experimental results have demonstrated that KEEP achieves 2.68x speedup with negligible accuracy loss compared with text-based memory methods on ALFRED dataset. Compared with the KV re-computation method CacheBlend (EuroSys'25), KEEP shows 4.13% success rate improvement and 1.90x time-to-first-token (TTFT) reduction. Our code is available on https://github.com/PKU-SEC-Lab/KEEP_Embodied_Memory.
Problem

Research questions and friction points this paper is trying to address.

KV cache
memory management
embodied planning
Large Language Models
prefill latency
Innovation

Methods, ideas, or system contributions that make the work stand out.

KV-cache
memory management
embodied planning
multi-hop re-computation
layer-balanced loading
๐Ÿ”Ž Similar Papers
No similar papers found.