ArcMemo: Abstract Reasoning Composition with Lifelong LLM Memory

📅 2025-09-04

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

Large language models (LLMs) discard knowledge generated during inference upon context reset, hindering cross-task reuse and continual learning. To address this, we propose a concept-level reasoning memory system that automatically extracts abstract, natural-language concepts from reasoning traces and stores them in an external memory bank. Our approach employs modular concept extraction, dynamic memory retrieval, and prompt-based integration—enabling test-time continual learning without updating model parameters. This establishes a self-reinforcing loop: “reasoning → abstraction → storage → retrieval → enhanced reasoning.” Evaluated on the ARC-AGI benchmark, our method achieves an average 7.5% absolute improvement over memory-free baselines across all compute scales, consistently outperforming them. Results demonstrate that concept-level memory effectively supports long-term knowledge accumulation and generalizable, compositional reasoning.

Technology Category

Application Category

📝 Abstract

While inference-time scaling enables LLMs to carry out increasingly long and capable reasoning traces, the patterns and insights uncovered during these traces are immediately discarded once the context window is reset for a new query. External memory is a natural way to persist these discoveries, and recent work has shown clear benefits for reasoning-intensive tasks. We see an opportunity to make such memories more broadly reusable and scalable by moving beyond instance-based memory entries (e.g. exact query/response pairs, or summaries tightly coupled with the original problem context) toward concept-level memory: reusable, modular abstractions distilled from solution traces and stored in natural language. For future queries, relevant concepts are selectively retrieved and integrated into the prompt, enabling test-time continual learning without weight updates. Our design introduces new strategies for abstracting takeaways from rollouts and retrieving entries for new queries, promoting reuse and allowing memory to expand with additional experiences. On the challenging ARC-AGI benchmark, our method yields a 7.5% relative gain over a strong no-memory baseline with performance continuing to scale with inference compute. We find abstract concepts to be the most consistent memory design, outscoring the baseline at all tested inference compute scales. Moreover, we confirm that dynamically updating memory during test-time outperforms an otherwise identical fixed memory setting with additional attempts, supporting the hypothesis that solving more problems and abstracting more patterns to memory enables further solutions in a form of self-improvement. Code available at https://github.com/matt-seb-ho/arc_memo.

Problem

Research questions and friction points this paper is trying to address.

Persisting reasoning insights beyond context window resets

Moving from instance-based to concept-level memory entries

Enabling test-time continual learning without weight updates

Innovation

Methods, ideas, or system contributions that make the work stand out.

Concept-level memory for reusable abstractions

Selective retrieval and integration of concepts

Test-time continual learning without weight updates

🔎 Similar Papers

Semantic Self-Consistency: Enhancing Language Model Reasoning via Semantic Weighting