"Always Nice and Confident, Sometimes Wrong": Developer's Experiences Engaging Large Language Models (LLMs) Versus Human-Powered Q&A Platforms for Coding Support

📅 2023-09-24

📈 Citations: 1

✨ Influential: 0

career value

185K/year

🤖 AI Summary

This study investigates how developers practically differentiate between and synergistically employ AI-based Q&A systems (e.g., ChatGPT) and human-curated platforms (e.g., Stack Overflow, SO) during real-world coding tasks. Method: Through qualitative analysis and thematic modeling of over 1,700 naturally occurring Reddit posts, we systematically compare these platforms across response quality, interaction efficiency, and trustworthiness. Contribution/Results: We identify complementary strengths: ChatGPT delivers rapid, linguistically accessible responses but lacks verifiability; SO provides community-vetted, highly trustworthy answers yet suffers from latency and inconsistent tone. Based on these findings, we propose a “hybrid workflow” design paradigm that integrates LLMs’ efficiency with SO’s traceability and auditability. We further derive six empirically grounded design principles for generative-AI–powered coding assistants. This work provides empirical foundations and methodological guidance for human–AI collaborative architectures in AI-augmented software development tools.

📝 Abstract

Software engineers have historically relied on human-powered Q&A platforms like Stack Overflow (SO) as coding aids. With the rise of generative AI, developers have started to adopt AI chatbots, such as ChatGPT, in their software development process. Recognizing the potential parallels between human-powered Q&A platforms and AI-powered question-based chatbots, we investigate and compare how developers integrate this assistance into their real-world coding experiences by conducting a thematic analysis of 1700+ Reddit posts. Through a comparative study of SO and ChatGPT, we identified each platform's strengths, use cases, and barriers. Our findings suggest that ChatGPT offers fast, clear, comprehensive responses and fosters a more respectful environment than SO. However, concerns about ChatGPT's reliability stem from its overly confident tone and the absence of validation mechanisms like SO's voting system. Based on these findings, we synthesized the design implications for future GenAI code assistants and recommend a workflow leveraging each platform's unique features to improve developer experiences.

Problem

Research questions and friction points this paper is trying to address.

Compare developer experiences with LLMs and human Q&A platforms.

Assess strengths, use cases, and barriers of ChatGPT and Stack Overflow.

Identify design implications for future AI code assistants.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Comparative analysis of LLMs and human Q&A

Thematic analysis of Reddit developer posts

Design implications for GenAI code assistants

🔎 Similar Papers

No similar papers found.

JPMorgan Chase

New York, NY, United States

Artificial Intelligence Engineer

Booz Allen Hamilton

$77,600.00 to $176,000.00 (annualized USD)

Herndon, VA

Authors to Follow