🤖 AI Summary
This study systematically investigates, for the first time, core practical challenges faced by developers using the OpenAI API—namely, prompt engineering complexity, token-level cost unpredictability, output non-determinism, and model opacity—previously lacking empirical grounding.
Method: Leveraging 2,874 high-quality Stack Overflow Q&A pairs, we apply human annotation, LDA topic modeling, statistical analysis, and qualitative induction to construct a nine-category problem taxonomy and identify fine-grained challenges within each.
Contribution/Results: We propose the first empirically grounded framework characterizing LLM usage difficulties for API practitioners, revealing a fundamental triadic tension among cost efficiency, controllability, and explainability. Our findings yield actionable optimization strategies for developers, concrete recommendations for API vendors to improve design and documentation, and novel research directions—including human-AI collaboration and trustworthy LLMs—for the broader community.
📝 Abstract
The rapid advancement of large language models (LLMs), represented by OpenAI's GPT series, has significantly impacted various domains such as natural language processing, software development, education, healthcare, finance, and scientific research. However, OpenAI APIs introduce unique challenges that differ from traditional APIs, such as the complexities of prompt engineering, token-based cost management, non-deterministic outputs, and operation as black boxes. To the best of our knowledge, the challenges developers encounter when using OpenAI APIs have not been explored in previous empirical studies. To fill this gap, we conduct the first comprehensive empirical study by analyzing 2,874 OpenAI API-related discussions from the popular Q&A forum Stack Overflow. We first examine the popularity and difficulty of these posts. After manually categorizing them into nine OpenAI API-related categories, we identify specific challenges associated with each category through topic modeling analysis. Based on our empirical findings, we finally propose actionable implications for developers, LLM vendors, and researchers.