Slipstream: Trajectory-Grounded Compaction Validation for Long-Horizon Agents

📅 2026-05-08

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

This work addresses the challenge of context compression for long-horizon agents, whose execution generates extensive contextual histories that existing compression methods struggle to handle effectively. Conventional approaches often incur uncontrolled accuracy degradation along critical reasoning paths due to their inability to anticipate future information needs, and such errors are difficult to detect. To mitigate this, the paper proposes an asynchronous context compression mechanism that runs the compressor in parallel with the agent. It introduces trajectory alignment verification: independently generated candidate summaries are compared against the agent’s subsequent reasoning steps, and an external discriminator evaluates how well these summaries preserve forward-looking intent and essential factual content. This enables structured validation of compression quality. Evaluated on SWE-bench Verified and BrowseComp benchmarks, the method improves task accuracy by up to 8.8 percentage points and reduces end-to-end latency by as much as 39.7%.

📝 Abstract

To cope with the large contexts that long-horizon LLM agents produce, modern frameworks increasingly rely on compaction -- invoking an LLM to rewrite the accumulated trajectory into a shorter summary that the agent resumes from. Today, compaction runs synchronously on the critical path of agent execution but this can unpredictably degrade accuracy due to a structural validation gap: the compactor must condense context but is fundamentally unaware of precisely what information the agent will need later. Further, because post-compaction agent steps are conditioned on the new summary, targeted validation criteria do not exist and errors silently propagate through coherent but incorrect behavior. Our key insight is that asynchronous compaction efficiently addresses this gap: by running the compactor in parallel with continued agent execution on the original context, the candidate summary and the agent's next steps are generated independently from the same pre-compaction state, yielding a validation signal independent of the summary itself. We build Slipstream, a trajectory-grounded compaction system that uses a judge to validate the candidate summary against the agent's continued reasoning, checking that it preserves both the agent's forward intent and the key facts and constraints it depends on. Across long-horizon coding (SWE-bench Verified) and web-browsing (BrowseComp) workloads, Slipstream improves task accuracy by up to 8.8 percentage points while reducing end-to-end latency by up to 39.7%.

Problem

Research questions and friction points this paper is trying to address.

compaction

long-horizon agents

trajectory validation

context summarization

error propagation

Innovation

Methods, ideas, or system contributions that make the work stand out.

asynchronous compaction

trajectory-grounded validation

long-horizon agents

context summarization

LLM agent reliability

🔎 Similar Papers

No similar papers found.

💼 Related Jobs

Machine Learning Engineer - Agentic AI

Apple

Sunnyvale, United States of America

Authors to Follow