Pancake: Hierarchical Memory System for Multi-Agent LLM Serving

๐Ÿ“… 2026-02-24
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

245K/year
๐Ÿค– AI Summary
This work addresses the high-cost approximate nearest neighbor (ANN) search challenges in large language modelโ€“based multi-agent systems, which arise from massive storage demands, frequent memory updates, and concurrent agent coexistence. To tackle this, we propose a hierarchical agent memory system that uniquely integrates multi-level index caching, cross-agent collaborative index management, and GPU-CPU heterogeneous computing acceleration within a unified framework. The design is compatible with mainstream agent platforms such as LangChain and LlamaIndex. Experimental evaluation under realistic multi-agent workloads demonstrates that our system achieves over 4.29ร— higher end-to-end throughput compared to existing approaches, significantly reducing ANN retrieval overhead while enhancing overall system efficiency.

Technology Category

Application Category

๐Ÿ“ Abstract
In this work, we identify and address the core challenges of agentic memory management in LLM serving, where large-scale storage, frequent updates, and multiple coexisting agents jointly introduce complex and high-cost approximate nearest neighbor (ANN) searching problems. We present Pancake, a multi-tier agentic memory system that unifies three key techniques: (i) multi-level index caching for single agents, (ii) coordinated index management across multiple agents, and (iii) collaborative GPU-CPU acceleration. Pancake exposes easy-to-use interface that can be integrated into memory-based agents like Mem-GPT, and is compatible with agentic frameworks such as LangChain and LlamaIndex. Experiments on realistic agent workloads show that Pancake substantially outperforms existing frameworks, achieving more than 4.29x end-to-end throughput improvement.
Problem

Research questions and friction points this paper is trying to address.

agentic memory management
approximate nearest neighbor search
multi-agent LLM serving
memory system
large language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

hierarchical memory system
multi-agent LLM serving
approximate nearest neighbor search
index caching
GPU-CPU acceleration
๐Ÿ”Ž Similar Papers
No similar papers found.