Relative Positioning Based Code Chunking Method For Rich Context Retrieval In Repository Level Code Completion Task With Code Language Model

๐Ÿ“… 2025-10-07
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address insufficient context construction and difficulty in capturing long-range dependencies in warehouse-scale code completion, this paper proposes a relative-positioning-based code chunking and retrieval method. The approach preprocesses the codebase via syntactic parsing and semantic similarity computation, incorporates a relative position encoding mechanism to explicitly model logical distances among code fragments, and integrates retrieval augmentation for dynamic context aggregation. Unlike conventional sliding-window or naive nearest-neighbor retrieval strategies, our method significantly improves large language modelsโ€™ completion accuracy and context relevance under IDE-constrained context windows. Experimental results across multiple benchmark datasets demonstrate average improvements of 2.1% in BLEU-4 score and 3.7% in top-1 accuracy, validating the methodโ€™s effectiveness and generalizability in real-world development scenarios.

Technology Category

Application Category

๐Ÿ“ Abstract
Code completion can help developers improve efficiency and ease the development lifecycle. Although code completion is available in modern integrated development environments (IDEs), research lacks in determining what makes a good context for code completion based on the information available to the IDEs for the large language models (LLMs) to perform better. In this paper, we describe an effective context collection strategy to assist the LLMs in performing better at code completion tasks. The key idea of our strategy is to preprocess the repository into smaller code chunks and later use syntactic and semantic similarity-based code chunk retrieval with relative positioning. We found that code chunking and relative positioning of the chunks in the final context improve the performance of code completion tasks.
Problem

Research questions and friction points this paper is trying to address.

Determining optimal context for code completion using repository information
Developing effective context collection strategy for large language models
Improving code completion through chunking and relative positioning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Preprocess repository into smaller code chunks
Retrieve chunks using syntactic semantic similarity
Apply relative positioning to improve context