🤖 AI Summary
Prior deep learning–based code completion methods primarily leverage local syntactic context, neglecting richer multidimensional contextual signals. Method: We conduct the first empirical study evaluating the impact of eight categories of heterogeneous contextual information—including AST paths, issue semantics, and API call histories—as well as their combinatorial effects. We propose a Transformer-based multi-source context fusion framework that jointly models parsed code features, textual embeddings, and developer behavioral sequences. Contribution/Results: Experiments across multiple standard benchmarks demonstrate that optimal context combinations yield up to a 22% relative improvement in Top-1 accuracy over code-only baselines. Our analysis reveals systematic complementarity across contextual dimensions, providing reproducible empirical evidence and a concrete technical pathway for designing context-aware code completion in intelligent programming assistants.
📝 Abstract
Code completion aims at speeding up code writing by recommending to developers the next tokens they are likely to type. Deep Learning (DL) models pushed the boundaries of code completion by redefining what these coding assistants can do: We moved from predicting few code tokens to automatically generating entire functions. One important factor impacting the performance of DL-based code completion techniques is the context provided them as input. With “context” we refer to what the model knows about the code to complete. In a simple scenario, the DL model might be fed with a partially implemented function to complete. In this case, the context is represented by the incomplete function and, based on it, the model must generate a prediction. It is however possible to expand such a context to include additional information, like the whole source code file containing the function to complete, which could be useful to boost the prediction performance. In this work, we present an empirical study investigating how the performance of a DL-based code completion technique is affected by different contexts. We experiment with 8 types of contexts and their combinations. These contexts include: (i) coding contexts, featuring information extracted from the code base in which the code completion is invoked (e.g., code components structurally related to the one to “complete”); (ii) process context, with information aimed at depicting the current status of the project in which a code completion task is triggered (e.g., a textual representation of open issues relevant for the code to complete); and (iii) developer contexts, capturing information about the developer invoking the code completion (e.g., the APIs they frequently use). Our results show that additional contextual information can benefit the performance of DL-based code completion, with relative improvements up to +22% in terms of correct predictions.