🤖 AI Summary
Large language models are constrained by limited context windows, hindering their ability to effectively store and flexibly retrieve long-term memory in extended conversations. To address this, this work proposes the first memory retrieval framework that integrates tool calling with autonomous decision-making. The approach constructs a multi-index memory repository through semantic adaptive chunking and structured note extraction, and employs an LLM agent that dynamically invokes tools such as key-value lookup and vector retrieval to enable on-demand, iterative retrieval and reasoning. This method overcomes the limitations of traditional static or fixed-pipeline retrieval strategies, significantly outperforming existing baselines on the LoCoMo dataset and demonstrating strong adaptability across diverse question types.
📝 Abstract
Large Language Model (LLM) has exhibited strong reasoning ability in text-based contexts across various domains, yet the limitation of context window poses challenges for the model on long-range inference tasks and necessitates a memory storage system. While many current storage approaches have been proposed with episodic notes and graph representations of memory, retrieval methods still primarily rely on predefined workflows or static similarity top-k over embeddings. To address this inflexibility, we introduced a novel tool-augmented autonomous memory retrieval framework (TA-Mem), which contains: (1) a memory extraction LLM agent which is prompted to adaptively chuck an input into sub-context based on semantic correlation, and extract information into structured notes, (2) a multi-indexed memory database designed for different types of query methods including both key-based lookup and similarity-based retrieval, (3) a tool-augmented memory retrieval agent which explores the memory autonomously by selecting appropriate tools provided by the database based on the user input, and decides whether to proceed to the next iteration or finalizing the response after reasoning on the fetched memories. The TA-Mem is evaluated on the LoCoMo dataset, achieving significant performance improvements over existing baseline approaches. In addition, an analysis of tool use across different question types also demonstrates the adaptivity of the proposed method.