TokMem: Tokenized Procedural Memory for Large Language Models

๐Ÿ“… 2025-09-30
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Large language models (LLMs) rely on repetitive prompting for task specification and reasoning, suffering from inherent limitations including low inference efficiency, poor cross-task generalization, and lack of modular reusability. To address these issues, we propose TokMemโ€”a trainable tokenized program memory mechanism for LLMs. TokMem encodes procedural knowledge as addressable, control-aware memory tokens; enables prompt-free invocation via compact token embeddings, frozen backbone weights, and dynamic memory scheduling; and supports continual expansion and compositional reuse of memory modules without interfering with the base modelโ€™s functionality. Evaluated on a benchmark comprising 1,000 atomic recall and function-calling reasoning tasks, TokMem significantly outperforms retrieval-augmented generation (RAG), reducing context overhead by 42%, requiring only 1/10 the parameters of fine-tuning baselines, and achieving higher accuracy.

Technology Category

Application Category

๐Ÿ“ Abstract
Large language models rely heavily on prompts to specify tasks, recall knowledge and guide reasoning. However, this reliance is inefficient as prompts must be re-read at each step, scale poorly across tasks, and lack mechanisms for modular reuse. We introduce TokMem, a tokenized procedural memory that stores recurring procedures as compact, trainable embeddings. Each memory token encodes both an address to a procedure and a control signal that steers generation, enabling targeted behavior with constant-size overhead. To support continual adaptation, TokMem keeps the backbone model frozen, allowing new procedures to be added without interfering with existing ones. We evaluate TokMem on 1,000 tasks for atomic recall, and on function-calling tasks for compositional recall, where it consistently outperforms retrieval-augmented generation while avoiding repeated context overhead, and fine-tuning with far fewer parameters. These results establish TokMem as a scalable and modular alternative to prompt engineering and fine-tuning, offering an explicit procedural memory for LLMs.
Problem

Research questions and friction points this paper is trying to address.

Eliminates repetitive prompt re-reading during generation
Enables modular procedure reuse across multiple tasks
Reduces parameter overhead compared to fine-tuning methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Tokenized procedural memory stores procedures as embeddings
Memory tokens encode procedure addresses and control signals
Frozen backbone model enables continual adaptation without interference
๐Ÿ”Ž Similar Papers
No similar papers found.