๐ค AI Summary
To address insufficient context construction and difficulty in capturing long-range dependencies in warehouse-scale code completion, this paper proposes a relative-positioning-based code chunking and retrieval method. The approach preprocesses the codebase via syntactic parsing and semantic similarity computation, incorporates a relative position encoding mechanism to explicitly model logical distances among code fragments, and integrates retrieval augmentation for dynamic context aggregation. Unlike conventional sliding-window or naive nearest-neighbor retrieval strategies, our method significantly improves large language modelsโ completion accuracy and context relevance under IDE-constrained context windows. Experimental results across multiple benchmark datasets demonstrate average improvements of 2.1% in BLEU-4 score and 3.7% in top-1 accuracy, validating the methodโs effectiveness and generalizability in real-world development scenarios.
๐ Abstract
Code completion can help developers improve efficiency and ease the development lifecycle. Although code completion is available in modern integrated development environments (IDEs), research lacks in determining what makes a good context for code completion based on the information available to the IDEs for the large language models (LLMs) to perform better. In this paper, we describe an effective context collection strategy to assist the LLMs in performing better at code completion tasks. The key idea of our strategy is to preprocess the repository into smaller code chunks and later use syntactic and semantic similarity-based code chunk retrieval with relative positioning. We found that code chunking and relative positioning of the chunks in the final context improve the performance of code completion tasks.