SpecAgent: A Speculative Retrieval and Forecasting Agent for Code Completion

📅 2025-10-20

📈 Citations: 0

✨ Influential: 0

career value

164K/year

🤖 AI Summary

Large language models (LLMs) suffer from degraded code completion performance in real-world software repositories due to project-specific APIs and cross-file dependencies. To address this, we propose a speculative retrieval agent: during indexing, it asynchronously prefetches and constructs context anticipated for future edits, shifting retrieval entirely to the offline phase and eliminating inference-time latency overhead. We further identify and rectify future-context leakage—a critical flaw in existing benchmarks—and introduce the first leakage-free synthetic evaluation benchmark. Our approach integrates repository-level dependency analysis, speculative context prediction, and retrieval-augmented generation. Experiments demonstrate that our method achieves absolute improvements of 9–11% (48–58% relative) in code generation quality over the strongest baseline, while substantially reducing inference latency.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) excel at code-related tasks but often struggle in realistic software repositories, where project-specific APIs and cross-file dependencies are crucial. Retrieval-augmented methods mitigate this by injecting repository context at inference time. The low inference-time latency budget affects either retrieval quality or the added latency adversely impacts user experience. We address this limitation with SpecAgent, an agent that improves both latency and code-generation quality by proactively exploring repository files during indexing and constructing speculative context that anticipates future edits in each file. This indexing-time asynchrony allows thorough context computation, masking latency, and the speculative nature of the context improves code-generation quality. Additionally, we identify the problem of future context leakage in existing benchmarks, which can inflate reported performance. To address this, we construct a synthetic, leakage-free benchmark that enables a more realistic evaluation of our agent against baselines. Experiments show that SpecAgent consistently achieves absolute gains of 9-11% (48-58% relative) compared to the best-performing baselines, while significantly reducing inference latency.

Problem

Research questions and friction points this paper is trying to address.

Improves code completion in software repositories with cross-file dependencies

Reduces inference latency while maintaining retrieval quality

Addresses future context leakage in existing evaluation benchmarks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Proactively explores repository files during indexing phase

Constructs speculative context anticipating future code edits

Uses asynchronous indexing to mask retrieval latency

🔎 Similar Papers

Retrieval-augmented code completion for local projects using large language models