Automating API Documentation from Crowdsourced Knowledge

📅 2026-01-12

📈 Citations: 0

✨ Influential: 0

career value

160K/year

🤖 AI Summary

Official API documentation often fails to meet developers’ needs due to being outdated and incomplete. This work proposes an automated approach that, for the first time, integrates fine-grained API knowledge extraction, dense retrieval, and large language model–based summarization to generate structured documentation from community content such as Stack Overflow. By fine-tuning a dense retrieval model to identify seven categories of API knowledge and incorporating hallucination mitigation and redundancy reduction mechanisms, the method significantly enhances generation quality. Experimental results show that it improves accuracy by up to 77.7% over baseline methods, reduces redundant content by 9.5%, and recovers 34.4% of critical knowledge missing from official documentation. User studies further confirm its substantial advantages in comprehensiveness, conciseness, and practical utility.

Technology Category

Application Category

📝 Abstract

API documentation is crucial for developers to learn and use APIs. However, it is known that many official API documents are obsolete and incomplete. To address this challenge, we propose a new approach called AutoDoc that generates API documents with API knowledge extracted from online discussions on Stack Overflow (SO). AutoDoc leverages a fine-tuned dense retrieval model to identify seven types of API knowledge from SO posts. Then, it uses GPT-4o to summarize the API knowledge in these posts into concise text. Meanwhile, we designed two specific components to handle LLM hallucination and redundancy in generated content. We evaluated AutoDoc against five comparison baselines on 48 APIs of different popularity levels. Our results indicate that the API documents generated by AutoDoc are up to 77.7% more accurate, 9.5% less duplicated, and contain 34.4% knowledge uncovered by the official documents. We also measured the sensitivity of AutoDoc to the choice of different LLMs. We found that while larger LLMs produce higher-quality API documents, AutoDoc enables smaller open-source models (e.g., Mistral-7B-v0.3) to achieve comparable results. Finally, we conducted a user study to evaluate the usefulness of the API documents generated by AutoDoc. All participants found API documents generated by AutoDoc to be more comprehensive, concise, and helpful than the comparison baselines. This highlights the feasibility of utilizing LLMs for API documentation with careful design to counter LLM hallucination and information redundancy.

Problem

Research questions and friction points this paper is trying to address.

API documentation

outdated documentation

incomplete documentation

developer knowledge

crowdsourced knowledge

Innovation

Methods, ideas, or system contributions that make the work stand out.

API documentation

crowdsourced knowledge

LLM hallucination mitigation