CodeWatcher: IDE Telemetry Data Extraction Tool for Understanding Coding Interactions with LLMs

📅 2025-10-13
📈 Citations: 0
✹ Influential: 0
📄 PDF
đŸ€– AI Summary
How can fine-grained, real-time interaction data between developers and code generation tools (CGTs) be collected without disrupting their workflow? This paper proposes a lightweight, non-intrusive client–server system: a VS Code extension captures semantically rich editing events—including code insertion/deletion, copy-paste operations, and focus transitions—while a Python-based RESTful API persistently stores structured telemetry in MongoDB. All components are containerized to ensure scalability and deployment simplicity. The key contribution is the first implementation enabling seamless, session-level behavioral reconstruction and analysis—achieving high-fidelity, low-overhead continuous monitoring of developer–CGT interactions. This infrastructure supports rigorous behavioral modeling, productivity assessment, and responsible AI research in AI-augmented programming environments.

Technology Category

Application Category

📝 Abstract
Understanding how developers interact with code generation tools (CGTs) requires detailed, real-time data on programming behavior which is often difficult to collect without disrupting workflow. We present extit{CodeWatcher}, a lightweight, unobtrusive client-server system designed to capture fine-grained interaction events from within the Visual Studio Code (VS Code) editor. extit{CodeWatcher} logs semantically meaningful events such as insertions made by CGTs, deletions, copy-paste actions, and focus shifts, enabling continuous monitoring of developer activity without modifying user workflows. The system comprises a VS Code plugin, a Python-based RESTful API, and a MongoDB backend, all containerized for scalability and ease of deployment. By structuring and timestamping each event, extit{CodeWatcher} enables post-hoc reconstruction of coding sessions and facilitates rich behavioral analyses, including how and when CGTs are used during development. This infrastructure is crucial for supporting research on responsible AI, developer productivity, and the human-centered evaluation of CGTs. Please find the demo, diagrams, and tool here: https://osf.io/j2kru/overview.
Problem

Research questions and friction points this paper is trying to address.

Captures fine-grained developer interactions with code generation tools
Enables continuous monitoring of coding behavior without workflow disruption
Facilitates post-hoc reconstruction of coding sessions for behavioral analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight client-server system for IDE telemetry
Logs fine-grained coding interactions without workflow disruption
Containerized architecture enabling scalable behavioral analysis
🔎 Similar Papers
No similar papers found.
M
Manaal Basha
University of British Columbia, Kelowna
A
AimeĂȘ M. Ribeiro
Federal University of ParĂĄ
J
Jeena Javahar
University of British Columbia, Kelowna
Cleidson R. B. de Souza
Cleidson R. B. de Souza
UFPA
CSCWSoftware Engineering
Gema Rodríguez-Pérez
Gema Rodríguez-Pérez
Assistant Professor, University of British Columbia (Okanagan)
Empirical StudiesMining software repositoryCode reviewBug-Introducing Changeshuman aspects in SE