Memento: Note-Taking for Your Future Self

📅 2025-06-25

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

To address the limited performance of large language models (LLMs) on multi-hop question answering—tasks demanding tight coupling between reasoning and retrieval—this paper proposes a three-stage dynamic collaborative prompting strategy: (1) decomposing the question, (2) dynamically retrieving and constructing a fact database grounded in intermediate reasoning outputs, and (3) synthesizing the final answer. Its core innovation lies in modeling reasoning as “taking notes for one’s future self,” enabling closed-loop interaction between retrieval and reasoning. The method integrates chain-of-thought (CoT), retrieval-augmented generation (RAG), and the ReAct framework to support end-to-end multi-step inference. Experiments demonstrate substantial gains: a 100% improvement in CoT accuracy on PhantomWiki; a 20.1-point F1-score gain over CoT-RAG on 2WikiMultiHopQA; and a 3.2-point F1 advantage over ReAct on MuSiQue—significantly outperforming IRCoT and other baselines.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) excel at reasoning-only tasks, but struggle when reasoning must be tightly coupled with retrieval, as in multi-hop question answering. To overcome these limitations, we introduce a prompting strategy that first decomposes a complex question into smaller steps, then dynamically constructs a database of facts using LLMs, and finally pieces these facts together to solve the question. We show how this three-stage strategy, which we call Memento, can boost the performance of existing prompting strategies across diverse settings. On the 9-step PhantomWiki benchmark, Memento doubles the performance of chain-of-thought (CoT) when all information is provided in context. On the open-domain version of 2WikiMultiHopQA, CoT-RAG with Memento improves over vanilla CoT-RAG by more than 20 F1 percentage points and over the multi-hop RAG baseline, IRCoT, by more than 13 F1 percentage points. On the challenging MuSiQue dataset, Memento improves ReAct by more than 3 F1 percentage points, demonstrating its utility in agentic settings.

Problem

Research questions and friction points this paper is trying to address.

Improving LLMs' retrieval-augmented reasoning for complex questions

Decomposing questions into steps and dynamically building fact databases

Boosting performance of existing prompting strategies in multi-hop QA

Innovation

Methods, ideas, or system contributions that make the work stand out.

Decomposes questions into smaller steps

Dynamically constructs a fact database

Pieces facts together to solve questions

🔎 Similar Papers

OmniQuery: Contextually Augmenting Captured Multimodal Memory to Enable Personal Question Answering