Experimental Analysis of Productive Interaction Strategy with ChatGPT: User Study on Function and Project-level Code Generation Tasks

📅 2025-08-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing LLM prompting research focuses narrowly on function-level tasks, overlooking project-level complexities—such as multi-class dependencies—and human–LLM interaction (HLI) characteristics critical to real-world software development. Method: We conduct an empirical study of developers interacting with ChatGPT for both function-level and project-level code generation. Using a novel project-level benchmark incorporating diverse inter-class dependencies, we collect fine-grained behavioral data via synchronized screen recordings and chat logs in controlled comparative experiments. Contribution/Results: We identify three HLI features significantly impacting generation efficiency; establish a fine-grained taxonomy of 29 error types; and derive five actionable, empirically grounded prompting guidelines. These findings provide both theoretical insight and practical guidance for enhancing LLM productivity in authentic software engineering contexts.

Technology Category

Application Category

📝 Abstract
The application of Large Language Models (LLMs) is growing in the productive completion of Software Engineering tasks. Yet, studies investigating the productive prompting techniques often employed a limited problem space, primarily focusing on well-known prompting patterns and mainly targeting function-level SE practices. We identify significant gaps in real-world workflows that involve complexities beyond class-level (e.g., multi-class dependencies) and different features that can impact Human-LLM Interactions (HLIs) processes in code generation. To address these issues, we designed an experiment that comprehensively analyzed the HLI features regarding the code generation productivity. Our study presents two project-level benchmark tasks, extending beyond function-level evaluations. We conducted a user study with 36 participants from diverse backgrounds, asking them to solve the assigned tasks by interacting with the GPT assistant using specific prompting patterns. We also examined the participants' experience and their behavioral features during interactions by analyzing screen recordings and GPT chat logs. Our statistical and empirical investigation revealed (1) that three out of 15 HLI features significantly impacted the productivity in code generation; (2) five primary guidelines for enhancing productivity for HLI processes; and (3) a taxonomy of 29 runtime and logic errors that can occur during HLI processes, along with suggested mitigation plans.
Problem

Research questions and friction points this paper is trying to address.

Investigates productive prompting techniques for ChatGPT in code generation
Addresses gaps in real-world workflows beyond function-level SE tasks
Analyzes Human-LLM interaction features impacting code generation productivity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Project-level benchmark tasks for code generation
User study with diverse participants and GPT
Analysis of HLI features impacting productivity
🔎 Similar Papers
No similar papers found.
S
Sangwon Hyun
Adelaide University, Australia
H
Hyunjun Kim
Korea Advanced Institute of Science and Technology, Republic of Korea
J
Jinhyuk Jang
Korea Advanced Institute of Science and Technology, Republic of Korea
H
Hyojin Choi
Korea Advanced Institute of Science and Technology, Republic of Korea
M. Ali Babar
M. Ali Babar
Professor of Software Engineering, The University of Adelaide, Australia
Software Security & PrivacyBig Data Platforms & ArchitecturesEmpirical Software EngineeringSoftware Architecture