🤖 AI Summary
This work investigates the effectiveness of Retrieval-Augmented Generation (RAG) in enhancing large language models’ (LLMs) code generation capability using documentation of obscure, open-source Python libraries. Addressing real-world development scenarios where practitioners rely on infrequently documented APIs, the authors introduce the first systematic evaluation framework for low-frequency API settings. Their method employs multi-granularity document chunking, code-example-driven retrieval, and a dedicated re-ranking strategy. Experiments demonstrate that RAG improves LLM code generation accuracy on obscure-API tasks by 83%–220%. Code examples are identified as the most information-rich document element—substantially outperforming descriptive text and parameter lists—while LLMs exhibit notable robustness to documentation noise. This study provides the first empirical validation of RAG’s efficacy in low-frequency API contexts and proposes a practical documentation optimization pathway centered on example quality and diversity.
📝 Abstract
Retrieval-augmented generation (RAG) has increasingly shown its power in extending large language models' (LLMs') capability beyond their pre-trained knowledge. Existing works have shown that RAG can help with software development tasks such as code generation, code update, and test generation. Yet, the effectiveness of adapting LLMs to fast-evolving or less common API libraries using RAG remains unknown. To bridge this gap, we take an initial step to study this unexplored yet practical setting - when developers code with a less common library, they often refer to its API documentation; likewise, when LLMs are allowed to look up API documentation via RAG, to what extent can LLMs be advanced? To mimic such a setting, we select four less common open-source Python libraries with a total of 1017 eligible APIs. We study the factors that affect the effectiveness of using the documentation of less common API libraries as additional knowledge for retrieval and generation. Our intensive study yields interesting findings: (1) RAG helps improve LLMs' performance by 83%-220%. (2) Example code contributes the most to advance LLMs, instead of the descriptive texts and parameter lists in the API documentation. (3) LLMs could sometimes tolerate mild noises (typos in description or incorrect parameters) by referencing their pre-trained knowledge or document context. Finally, we suggest that developers pay more attention to the quality and diversity of the code examples in the API documentation. The study sheds light on future low-code software development workflows.