Cooperative Memory Paging with Keyword Bookmarks for Long-Horizon LLM Conversations

📅 2026-04-14

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

This work addresses the limitation of large language models in long-context dialogue, where constrained context windows lead to the loss of critical historical information and hinder on-demand retrieval. The authors propose a collaborative memory paging mechanism that replaces evicted dialogue segments with keyword-based bookmarks and equips the model with a recall() tool to actively retrieve full content when needed. This approach uniquely integrates keyword bookmarks with a model-driven recall strategy, significantly enhancing long-dialogue question-answering performance at minimal computational overhead. Experiments on the LoCoMo benchmark demonstrate that the method outperforms six baselines—including truncation and BM25—achieving state-of-the-art results across four mainstream models (p = 0.017). Furthermore, the specificity of the generated bookmarks contributes up to a 25% improvement in accuracy.

Technology Category

Application Category

📝 Abstract

When LLM conversations grow beyond the context window, old content must be evicted -- but how does the model recover it when needed? We propose cooperative paging: evicted segments are replaced with minimal keyword bookmarks ([pN:keywords], ~8-24 tokens each), and the model is given a recall() tool to retrieve full content on demand. On the LoCoMo benchmark (10 real multi-session conversations, 300+ turns), cooperative paging achieves the highest answer quality among six methods -- outperforming truncation, BM25, word-overlap retrieval, a search-tool baseline, and full context -- on four models (GPT-4o-mini, DeepSeek-v3.2, Claude Haiku, GLM-5), confirmed by four independent LLM judges ($p=0.017$, paired bootstrap). We then study the paging design space with a 5x4 ablation over boundary strategies and eviction policies (3,176 synthetic probes, 1,600 LoCoMo probes). Key findings: (1) coarse fixed-size pages (fixed_20) reach 96.7% while content-aware topic_shift collapses to 56.7%; (2) eviction policy choice is data-dependent (FIFO best on synthetic, LFU on LoCoMo); (3) two bookmark generation strategies improve over the heuristic baseline (+4.4 and +8.7 E2E points); (4) the remaining bottleneck is bookmark discrimination -- the model triggers recall() 96% of the time but selects the correct page only 57% when bookmarks are insufficiently distinctive. Keyword specificity alone accounts for a 25 percentage point accuracy difference.

Problem

Research questions and friction points this paper is trying to address.

long-horizon conversations

context window limitation

memory paging

information retrieval

keyword bookmarks

Innovation

Methods, ideas, or system contributions that make the work stand out.

cooperative paging

keyword bookmarks

long-horizon conversations