Panini: Continual Learning in Token Space via Structured Memory

📅 2026-02-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of conventional retrieval-augmented generation (RAG) methods, which redundantly reprocess documents during inference and often inject irrelevant context, leading to unreliable answers. The authors propose a human-like, non-parametric continual learning framework that keeps the base model fixed and constructs an external semantic memory via a Generative Semantic Workspace (GSW). New experiences are structured into an entity- and event-aware network of question-answer pairs, enabling chain-of-thought knowledge retrieval. By organizing knowledge efficiently at write time and enabling precise reasoning at read time, the approach achieves average performance gains of 5%–7% across six question-answering benchmarks, reduces context token usage by 2–30×, substantially decreases unreliable responses, and provides a fully open-source pipeline.

Technology Category

Application Category

📝 Abstract
Language models are increasingly used to reason over content they were not trained on, such as new documents, evolving knowledge, and user-specific data. A common approach is retrieval-augmented generation (RAG), which stores verbatim documents externally (as chunks) and retrieves only a relevant subset at inference time for an LLM to reason over. However, this results in inefficient usage of test-time compute (LLM repeatedly reasons over the same documents); moreover, chunk retrieval can inject irrelevant context that increases unsupported generation. We propose a human-like non-parametric continual learning framework, where the base model remains fixed, and learning occurs by integrating each new experience into an external semantic memory state that accumulates and consolidates itself continually. We present Panini, which realizes this by representing documents as Generative Semantic Workspaces (GSW) -- an entity- and event-aware network of question-answer (QA) pairs, sufficient for an LLM to reconstruct the experienced situations and mine latent knowledge via reasoning-grounded inference chains on the network. Given a query, Panini only traverses the continually-updated GSW (not the verbatim documents or chunks), and retrieves the most likely inference chains. Across six QA benchmarks, Panini achieves the highest average performance, 5%-7% higher than other competitive baselines, while using 2-30x fewer answer-context tokens, supports fully open-source pipelines, and reduces unsupported answers on curated unanswerable queries. The results show that efficient and accurate structuring of experiences at write time -- as achieved by the GSW framework -- yields both efficiency and reliability gains at read time. Code is available at https://github.com/roychowdhuryresearch/gsw-memory.
Problem

Research questions and friction points this paper is trying to address.

retrieval-augmented generation
continual learning
semantic memory
language models
unsupported generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

continual learning
structured memory
generative semantic workspace
retrieval-augmented generation
reasoning-grounded inference
🔎 Similar Papers
No similar papers found.
S
Shreyas Rajesh
Department of Electrical and Computer Engineering, University of California, Los Angeles, USA
P
Pavan Holur
Department of Electrical and Computer Engineering, University of California, Los Angeles, USA
M
Mehmet Yigit Turali
Department of Electrical and Computer Engineering, University of California, Los Angeles, USA
Chenda Duan
Chenda Duan
Ph.D Student, University of California, Los Angeles
AI for ScienceMultimodal LLMsAutonomous Agents
Vwani Roychowdhury
Vwani Roychowdhury
Professor of Electrical and Computer Engineering, UCLA
Brain-Inspired AIQuantum ComputingPhysics and AIComputational NarratologyMachine Learning