Who's the Leader? Analyzing Novice Workflows in LLM-Assisted Debugging of Machine Learning Code

📅 2025-05-12

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

This study investigates the cognitive interaction mechanisms between novice machine learning (ML) engineers and large language models (LLMs) during debugging, focusing on how “LLM-directed” versus “LLM-dominant” interaction paradigms induce overreliance or underreliance—thereby degrading verification capability and conceptual understanding. Through an empirical study with eight beginners, we conducted task log analysis and heuristic coding to systematically identify human–LLM cognitive imbalance in high-complexity, low-verifiability ML tasks—a phenomenon previously uncharacterized. Results show that 75% of debugging iterations involved passive acceptance of LLM suggestions, leading to verification gaps and conceptual misconceptions. Building on these findings, we propose three actionable interaction-enhancement strategies: structured prompting, real-time feedback integration, and explicit verification scaffolding. Evaluation demonstrates that these strategies significantly improve users’ cognitive engagement and conceptual learning outcomes.

Technology Category

Application Category

📝 Abstract

While LLMs are often touted as tools for democratizing specialized knowledge to beginners, their actual effectiveness for improving task performance and learning is still an open question. It is known that novices engage with LLMs differently from experts, with prior studies reporting meta-cognitive pitfalls that affect novices' ability to verify outputs and prompt effectively. We focus on a task domain, machine learning (ML), which embodies both high complexity and low verifiability to understand the impact of LLM assistance on novices. Provided a buggy ML script and open access to ChatGPT, we conduct a formative study with eight novice ML engineers to understand their reliance on, interactions with, and perceptions of the LLM. We find that user actions can be roughly categorized into leading the LLM and led-by the LLM, and further investigate how they affect reliance outcomes like over- and under-reliance. These results have implications on novices' cognitive engagement in LLM-assisted tasks and potential negative effects on downstream learning. Lastly, we pose potential augmentations to the novice-LLM interaction paradigm to promote cognitive engagement.

Problem

Research questions and friction points this paper is trying to address.

Assessing LLM effectiveness for novice task performance and learning

Exploring novice-expert differences in LLM interaction and verification

Investigating reliance outcomes in LLM-assisted ML debugging by novices

Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzing novice workflows in LLM-assisted debugging

Categorizing user actions as leading or led-by LLM

Proposing augmentations to novice-LLM interaction paradigm

🔎 Similar Papers

No similar papers found.