🤖 AI Summary
Empirical evidence on large language model (LLM) usage in real-world software development remains scarce. Method: We constructed DevChat—the first large-scale, publicly shared dataset of developer–ChatGPT conversations (2,547 GitHub-linked dialogues, May 2023–June 2024)—and applied a mixed-methods approach combining qualitative coding, statistical analysis, and task modeling. Contribution/Results: We systematically identified five primary developer intents when using ChatGPT and established a novel three-dimensional mapping framework linking data sources, development activities, and software engineering tasks. Our analysis reveals key patterns—including prompt-turn distributions, characteristics of linked repository descriptions, and prevalent application scenarios such as code generation, debugging, and documentation writing. This work fills a critical gap in empirical LLM research within authentic development contexts and provides a validated foundation for designing, optimizing, and integrating AI-powered programming tools into software engineering workflows.
📝 Abstract
The advent of Large Language Models (LLMs) has introduced a new paradigm in software engineering, with generative AI tools like ChatGPT gaining widespread adoption among developers. While ChatGPT's potential has been extensively discussed, there is limited empirical evidence exploring its real-world usage by developers. This study bridges this gap by conducting a large-scale empirical analysis of ChatGPT-assisted development activities, leveraging a curated dataset, DevChat, comprising 2,547 unique shared ChatGPT links collected from GitHub between May 2023 and June 2024. Our study examines the characteristics of ChatGPT's usage on GitHub (including the tendency, prompt turns distribution, and link descriptions) and identifies five categories of developers' purposes for sharing developer-ChatGPT conversations during software development. Additionally, we analyzed the development-related activities where developers shared ChatGPT links to facilitate their workflows. We then established a mapping framework among data sources, activities, and SE tasks associated with these shared ChatGPT links. Our study offers a comprehensive view of ChatGPT's application in real-world software development scenarios and provides a foundation for its future integration into software development workflows.