An Empirical Study of Proactive Coding Assistants in Real-World Software Development

๐Ÿ“… 2026-05-07
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

211K/year
๐Ÿค– AI Summary
This work addresses a critical limitation in existing research on proactive programming assistants, which relies on IDE interaction data simulated by large language models and fails to capture authentic developer behavior, leading to biased evaluations. To bridge this gap, the authors developed a VS Code extension to collect real-world interaction logs from 1,246 industrial developers and constructed paired model-simulated trajectories. Their analysis reveals significant discrepancies between real and simulated behaviors in terms of action diversity, temporal structure, and exploration patterns. Building on these insights, they introduce ProCodeBenchโ€”the first real-world benchmark for evaluating proactive programming assistants. Experiments demonstrate that state-of-the-art methods perform substantially worse on real trajectories than on simulated ones, underscoring the essential role of authentic behavioral data in developing effective assistants, while suggesting simulated data may still serve as a supplementary training resource.
๐Ÿ“ Abstract
Large language model (LLM)-based coding assistants have made substantial progress, yet most systems remain reactive, requiring developers to explicitly formulate their needs. Proactive coding assistants aim to infer latent developer intent from integrated development environment (IDE) interactions and repository context, thereby reducing interaction overhead and supporting more seamless assistance. However, research in this direction is limited by the scarcity of large-scale real-world developer behavior data. Existing studies therefore often rely on LLM-simulated IDE traces, whose fidelity to real development behavior remains unclear. In this paper, we investigate this simulation-to-reality gap through a large-scale empirical study. We collect real IDE interaction traces from 1{,}246 experienced industry developers over three consecutive days using a custom Visual Studio Code extension, and construct paired LLM-simulated traces for controlled comparison. Our analysis shows that simulated traces differ substantially from real traces in behavioral diversity, temporal structure, and exploratory patterns. Based on the collected data, we introduce \textbf{ProCodeBench}, a real-world benchmark for proactive intent prediction. Experiments with representative LLMs, retrieval-augmented methods, and agentic baselines show that current approaches remain far from reliable under real IDE traces, suggesting that simulation-based evaluation can overestimate real-world performance. Finally, our training study shows that simulated data cannot replace real data, but can complement it when used before real-world fine-tuning. These findings highlight the importance of real developer behavior data for evaluating and training proactive coding assistants.
Problem

Research questions and friction points this paper is trying to address.

proactive coding assistants
real-world developer behavior
IDE interaction traces
simulation-to-reality gap
large language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

proactive coding assistants
real-world developer behavior
simulation-to-reality gap
ProCodeBench
IDE interaction traces
๐Ÿ”Ž Similar Papers