Your Coding Intent is Secretly in the Context and You Should Deliberately Infer It Before Completion

📅 2025-08-13

📈 Citations: 0

✨ Influential: 0

career value

138K/year

🤖 AI Summary

Large language models (LLMs) exhibit significantly degraded function completion performance when source code lacks explicit documentation (e.g., docstrings), hindering accurate intent understanding. Method: This paper proposes a three-stage, intention-driven approach: (1) context-aware intention encoding via reasoning over code context; (2) interactive intention refinement to enhance precision; and (3) target function generation conditioned on the clarified intention. Contribution/Results: The method innovatively shifts intention inference to the earliest stage and constructs a high-quality, 40K-instance dataset featuring intermediate reasoning traces. It introduces an optional interactive alignment mechanism and integrates chain-of-thought prompting, context signal extraction/synthesis, and multi-stage conditional generation. Evaluated on DevEval and ComplexCodeEval, our approach yields average improvements exceeding 20% across mainstream LLMs; the interactive component delivers further substantial gains.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) are increasingly used for function completion in repository-scale codebases. Prior studies demonstrate that when explicit instructions--such as docstrings--are provided, these models can generate highly accurate implementations. However, in real-world repositories, such annotations are frequently absent, and performance drops substantially without them. To address this gap, we frame the task as a three-stage process. The first stage focuses on intent inference, where the model analyzes the code preceding the target function to uncover cues about the desired functionality. Such preceding context often encodes subtle but critical information, and we design a reasoning-based prompting framework to guide the LLM through step-by-step extraction and synthesis of these signals before any code is generated. The second stage introduces an optional interactive refinement mechanism to handle cases where preceding context alone is insufficient for intent recovery. In this stage, the model proposes a small set of candidate intentions, enabling the developer to select or edit them so that the inferred intent closely matches the actual requirement. Finally, in the third stage, the LLM generates the target function conditioned on the finalized intent. To support this pipeline, we curate a dataset of 40,000 examples annotated with intermediate reasoning traces and corresponding docstrings. Extensive experiments on DevEval and ComplexCodeEval show that our approach consistently boosts multiple LLMs, achieving over 20% relative gains in both reference-based and execution-based metrics, with the interactive refinement stage delivering additional improvements beyond these gains.

Problem

Research questions and friction points this paper is trying to address.

LLMs struggle with code completion without explicit instructions like docstrings

Intent inference from preceding code context is crucial for accurate function generation

Interactive refinement improves intent recovery when context alone is insufficient

Innovation

Methods, ideas, or system contributions that make the work stand out.

Intent inference from preceding code context

Interactive refinement for ambiguous intents

Multi-stage prompting with reasoning traces

🔎 Similar Papers

No similar papers found.