Beyond More Context: How Granularity and Order Drive Code Completion Quality

📅 2025-10-07

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

Large language models (LLMs) face two key challenges in code completion: limited context window capacity and susceptibility to noisy, irrelevant context. This paper investigates how context granularity—file-level versus block-level—and retrieval ranking strategies affect generation quality. We propose a static-analysis-driven, block-level context retrieval method that enables fine-grained, semantically relevant context extraction, followed by optimized context composition and ordering. Experiments on Python code completion show that our approach improves completion accuracy by 6% over the best-performing file-level retrieval baseline and by 16% over a no-context baseline. Our core contributions are: (1) empirical validation that block-level context significantly enhances effectiveness under strict context-length constraints; and (2) the first lightweight, deployable retrieval framework that jointly integrates static program analysis with context sequence control. This work establishes a practical, production-ready paradigm for context optimization in industrial code completion systems.

Technology Category

Application Category

📝 Abstract

Context plays an important role in the quality of code completion, as Large Language Models (LLMs) require sufficient and relevant information to assist developers in code generation tasks. However, composing a relevant context for code completion poses challenges in large repositories: First, the limited context length of LLMs makes it impractical to include all repository files. Second, the quality of generated code is highly sensitive to noisy or irrelevant context. In this paper, we present our approach for the ASE 2025 Context Collection Challenge. The challenge entails outperforming JetBrains baselines by designing effective retrieval and context collection strategies. We develop and evaluate a series of experiments that involve retrieval strategies at both the file and chunk levels. We focus our initial experiments on examining the impact of context size and file ordering on LLM performance. Our results show that the amount and order of context can significantly influence the performance of the models. We introduce chunk-based retrieval using static analysis, achieving a 6% improvement over our best file-retrieval strategy and a 16% improvement over the no-context baseline for Python in the initial phase of the competition. Our results highlight the importance of retrieval granularity, ordering and hybrid strategies in developing effective context collection pipelines for real-world development scenarios.

Problem

Research questions and friction points this paper is trying to address.

Optimizing context granularity and order for code completion

Improving retrieval strategies to overcome LLM context limitations

Enhancing code generation quality by reducing irrelevant context noise

Innovation

Methods, ideas, or system contributions that make the work stand out.

Chunk-based retrieval using static analysis

File and chunk level retrieval strategies

Hybrid strategies for context collection pipelines

🔎 Similar Papers

No similar papers found.