ArcMemo: Abstract Reasoning Composition with Lifelong LLM Memory

📅 2025-09-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) discard knowledge generated during inference upon context reset, hindering cross-task reuse and continual learning. To address this, we propose a concept-level reasoning memory system that automatically extracts abstract, natural-language concepts from reasoning traces and stores them in an external memory bank. Our approach employs modular concept extraction, dynamic memory retrieval, and prompt-based integration—enabling test-time continual learning without updating model parameters. This establishes a self-reinforcing loop: “reasoning → abstraction → storage → retrieval → enhanced reasoning.” Evaluated on the ARC-AGI benchmark, our method achieves an average 7.5% absolute improvement over memory-free baselines across all compute scales, consistently outperforming them. Results demonstrate that concept-level memory effectively supports long-term knowledge accumulation and generalizable, compositional reasoning.

Technology Category

Application Category

📝 Abstract
While inference-time scaling enables LLMs to carry out increasingly long and capable reasoning traces, the patterns and insights uncovered during these traces are immediately discarded once the context window is reset for a new query. External memory is a natural way to persist these discoveries, and recent work has shown clear benefits for reasoning-intensive tasks. We see an opportunity to make such memories more broadly reusable and scalable by moving beyond instance-based memory entries (e.g. exact query/response pairs, or summaries tightly coupled with the original problem context) toward concept-level memory: reusable, modular abstractions distilled from solution traces and stored in natural language. For future queries, relevant concepts are selectively retrieved and integrated into the prompt, enabling test-time continual learning without weight updates. Our design introduces new strategies for abstracting takeaways from rollouts and retrieving entries for new queries, promoting reuse and allowing memory to expand with additional experiences. On the challenging ARC-AGI benchmark, our method yields a 7.5% relative gain over a strong no-memory baseline with performance continuing to scale with inference compute. We find abstract concepts to be the most consistent memory design, outscoring the baseline at all tested inference compute scales. Moreover, we confirm that dynamically updating memory during test-time outperforms an otherwise identical fixed memory setting with additional attempts, supporting the hypothesis that solving more problems and abstracting more patterns to memory enables further solutions in a form of self-improvement. Code available at https://github.com/matt-seb-ho/arc_memo.
Problem

Research questions and friction points this paper is trying to address.

Persisting reasoning insights beyond context window resets
Moving from instance-based to concept-level memory entries
Enabling test-time continual learning without weight updates
Innovation

Methods, ideas, or system contributions that make the work stand out.

Concept-level memory for reusable abstractions
Selective retrieval and integration of concepts
Test-time continual learning without weight updates
🔎 Similar Papers
No similar papers found.