Pancake: Hierarchical Memory System for Multi-Agent LLM Serving

📅 2026-02-24

📈 Citations: 0

✨ Influential: 0

career value

236K/year

🤖 AI Summary

This work addresses the high-cost approximate nearest neighbor (ANN) search challenges in large language model–based multi-agent systems, which arise from massive storage demands, frequent memory updates, and concurrent agent coexistence. To tackle this, we propose a hierarchical agent memory system that uniquely integrates multi-level index caching, cross-agent collaborative index management, and GPU-CPU heterogeneous computing acceleration within a unified framework. The design is compatible with mainstream agent platforms such as LangChain and LlamaIndex. Experimental evaluation under realistic multi-agent workloads demonstrates that our system achieves over 4.29× higher end-to-end throughput compared to existing approaches, significantly reducing ANN retrieval overhead while enhancing overall system efficiency.

Technology Category

Application Category

📝 Abstract

In this work, we identify and address the core challenges of agentic memory management in LLM serving, where large-scale storage, frequent updates, and multiple coexisting agents jointly introduce complex and high-cost approximate nearest neighbor (ANN) searching problems. We present Pancake, a multi-tier agentic memory system that unifies three key techniques: (i) multi-level index caching for single agents, (ii) coordinated index management across multiple agents, and (iii) collaborative GPU-CPU acceleration. Pancake exposes easy-to-use interface that can be integrated into memory-based agents like Mem-GPT, and is compatible with agentic frameworks such as LangChain and LlamaIndex. Experiments on realistic agent workloads show that Pancake substantially outperforms existing frameworks, achieving more than 4.29x end-to-end throughput improvement.

Problem

Research questions and friction points this paper is trying to address.

agentic memory management

approximate nearest neighbor search

multi-agent LLM serving

memory system

large language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

hierarchical memory system

multi-agent LLM serving

approximate nearest neighbor search