OASBuilder: Generating OpenAPI Specifications from Online API Documentation with Large Language Models

๐Ÿ“… 2025-07-07
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Online API documentation is predominantly published in unstructured HTML, necessitating substantial manual effort to convert it into structured specifications (e.g., OpenAPI) for AI agents and automated systems. Method: We propose a collaborative conversion framework integrating domain knowledge, large language models (LLMs), and web-structure-aware rulesโ€”enabling the first end-to-end, high-accuracy OpenAPI generation from lengthy, heterogeneous API documentation. Our approach employs multi-stage prompt engineering, structure-guided rule constraints, and rigorous format validation to significantly improve LLM output accuracy and consistency. Contribution/Results: Evaluated on hundreds of real-world API documentation pages, our method achieves full semantic coverage of original specifications. Deployed enterprise-wide, it has saved over thousands of person-hours of manual effort. The generated OpenAPI specifications robustly support LLM-driven service integration and tool-mediated API invocation.

Technology Category

Application Category

๐Ÿ“ Abstract
AI agents and business automation tools interacting with external web services require standardized, machine-readable information about their APIs in the form of API specifications. However, the information about APIs available online is often presented as unstructured, free-form HTML documentation, requiring external users to spend significant time manually converting it into a structured format. To address this, we introduce OASBuilder, a novel framework that transforms long and diverse API documentation pages into consistent, machine-readable API specifications. This is achieved through a carefully crafted pipeline that integrates large language models and rule-based algorithms which are guided by domain knowledge of the structure of documentation webpages. Our experiments demonstrate that OASBuilder generalizes well across hundreds of APIs, and produces valid OpenAPI specifications that encapsulate most of the information from the original documentation. OASBuilder has been successfully implemented in an enterprise environment, saving thousands of hours of manual effort and making hundreds of complex enterprise APIs accessible as tools for LLMs.
Problem

Research questions and friction points this paper is trying to address.

Convert unstructured API docs to machine-readable specs
Automate OpenAPI spec generation using LLMs
Reduce manual effort in API documentation processing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses LLMs to convert HTML docs to API specs
Combines rule-based algorithms with domain knowledge
Automates OpenAPI spec generation for enterprises