Log-Augmented Generation: Scaling Test-Time Reasoning with Reusable Computation

📅 2025-05-20

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

Large language models (LLMs) struggle to reuse historical reasoning experiences during inference, resulting in poor cross-task adaptability. To address this, we propose Log-Augmented Generation (LAG), the first framework to elevate key-value (KV) caching from a mere efficiency optimization tool to a generalizable mechanism for memory and reasoning enhancement. LAG structurally logs complete reasoning trajectories into a retrievable KV cache memory bank and enables direct cross-task cache matching and reuse based on similarity—without requiring reflection or knowledge distillation. Evaluated on knowledge- and reasoning-intensive benchmarks, LAG significantly outperforms both log-free baselines and state-of-the-art reflection-based or KV-caching methods, achieving up to 12.7% absolute accuracy gain. Crucially, it maintains low inference latency and high scalability, demonstrating that structured logging of reasoning processes unlocks effective, efficient, and generalizable memory reuse in LLMs.

Technology Category

Application Category

📝 Abstract

While humans naturally learn and adapt from past experiences, large language models (LLMs) and their agentic counterparts struggle to retain reasoning from previous tasks and apply them in future contexts. To address this limitation, we propose a novel framework, log-augmented generation (LAG) that directly reuses prior computation and reasoning from past logs at test time to enhance model's ability to learn from previous tasks and perform better on new, unseen challenges, all while keeping the system efficient and scalable. Specifically, our system represents task logs using key-value (KV) caches, encoding the full reasoning context of prior tasks while storing KV caches for only a selected subset of tokens. When a new task arises, LAG retrieves the KV values from relevant logs to augment generation. Our approach differs from reflection-based memory mechanisms by directly reusing prior reasoning and computations without requiring additional steps for knowledge extraction or distillation. Our method also goes beyond existing KV caching techniques, which primarily target efficiency gains rather than improving accuracy. Experiments on knowledge- and reasoning-intensive datasets demonstrate that our method significantly outperforms standard agentic systems that do not utilize logs, as well as existing solutions based on reflection and KV cache techniques.

Problem

Research questions and friction points this paper is trying to address.

Enhancing LLMs' ability to reuse prior reasoning from logs

Improving test-time performance on unseen tasks efficiently

Scaling reasoning without reflection or knowledge distillation steps

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reuses prior computation via log-augmented generation (LAG)

Encodes reasoning in key-value caches for retrieval

Enhances accuracy without extra distillation steps

🔎 Similar Papers

Retrieval-Augmented Test Generation: How Far Are We?