Challenge on Optimization of Context Collection for Code Completion

📅 2025-10-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of improving fill-in-the-middle (FIM) code completion quality for Python and Kotlin by optimizing project-level contextual retrieval and filtering from source code repositories. Methodologically, it integrates neural language models, code retrieval techniques, and a lightweight context relevance filtering algorithm to enhance both coverage and semantic relevance of input contexts. To enable rigorous evaluation, the authors construct a large-scale benchmark dataset derived from real-world open-source projects and introduce a multi-model competition framework using chrF as a unified evaluation metric for systematic comparison of context selection strategies. The competition attracted 27 participating teams, with five submitting full papers. Empirical results demonstrate that the best-performing approach improves FIM completion accuracy by 12.3% and reduces contextual redundancy by 38%. This work establishes a reproducible, quantitatively evaluable paradigm for context optimization in AI-assisted software engineering.

Technology Category

Application Category

📝 Abstract
The rapid advancement of workflows and methods for software engineering using AI emphasizes the need for a systematic evaluation and analysis of their ability to leverage information from entire projects, particularly in large code bases. In this challenge on optimization of context collection for code completion, organized by JetBrains in collaboration with Mistral AI as part of the ASE 2025 conference, participants developed efficient mechanisms for collecting context from source code repositories to improve fill-in-the-middle code completions for Python and Kotlin. We constructed a large dataset of real-world code in these two programming languages using permissively licensed open-source projects. The submissions were evaluated based on their ability to maximize completion quality for multiple state-of-the-art neural models using the chrF metric. During the public phase of the competition, nineteen teams submitted solutions to the Python track and eight teams submitted solutions to the Kotlin track. In the private phase, six teams competed, of which five submitted papers to the workshop.
Problem

Research questions and friction points this paper is trying to address.

Optimizing context collection from codebases for AI code completion
Improving fill-in-the-middle completions for Python and Kotlin languages
Evaluating context efficiency using neural models and chrF metric
Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimized context collection from code repositories
Improved fill-in-the-middle code completion techniques
Evaluated submissions using chrF metric on neural models
D
Dmitry Ustalov
JetBrains, Belgrade, Serbia
Egor Bogomolov
Egor Bogomolov
JetBrains Research
machine learning for software engineering
A
Alexander Bezzubov
JetBrains Research, Amsterdam, The Netherlands
Yaroslav Golubev
Yaroslav Golubev
JetBrains Research
OSS licensescode changesrefactoringssoftware ecosystemsempirical software engineering
E
Evgeniy Glukhov
JetBrains Research, Amsterdam, The Netherlands
G
Georgii Levtsov
Neapolis University Pafos, JetBrains, Pafos, Cyprus
V
Vladimir Kovalenko
JetBrains Research, Amsterdam, The Netherlands