🤖 AI Summary
Large language models (LLMs) suffer from degraded code completion performance in real-world software repositories due to project-specific APIs and cross-file dependencies. To address this, we propose a speculative retrieval agent: during indexing, it asynchronously prefetches and constructs context anticipated for future edits, shifting retrieval entirely to the offline phase and eliminating inference-time latency overhead. We further identify and rectify future-context leakage—a critical flaw in existing benchmarks—and introduce the first leakage-free synthetic evaluation benchmark. Our approach integrates repository-level dependency analysis, speculative context prediction, and retrieval-augmented generation. Experiments demonstrate that our method achieves absolute improvements of 9–11% (48–58% relative) in code generation quality over the strongest baseline, while substantially reducing inference latency.
📝 Abstract
Large Language Models (LLMs) excel at code-related tasks but often struggle in realistic software repositories, where project-specific APIs and cross-file dependencies are crucial. Retrieval-augmented methods mitigate this by injecting repository context at inference time. The low inference-time latency budget affects either retrieval quality or the added latency adversely impacts user experience. We address this limitation with SpecAgent, an agent that improves both latency and code-generation quality by proactively exploring repository files during indexing and constructing speculative context that anticipates future edits in each file. This indexing-time asynchrony allows thorough context computation, masking latency, and the speculative nature of the context improves code-generation quality. Additionally, we identify the problem of future context leakage in existing benchmarks, which can inflate reported performance. To address this, we construct a synthetic, leakage-free benchmark that enables a more realistic evaluation of our agent against baselines. Experiments show that SpecAgent consistently achieves absolute gains of 9-11% (48-58% relative) compared to the best-performing baselines, while significantly reducing inference latency.