🤖 AI Summary
This work addresses the poor robustness of existing API discovery methods under mixed runtime traffic—such as scenarios where multiple applications share a common observation point—where static approaches suffer from high false-positive rates due to source code dependencies, and dynamic black-box techniques exhibit degraded accuracy in complex environments. To overcome these limitations, we propose APISENSOR, an unsupervised black-box framework that accurately reconstructs Web APIs from mixed traffic through traffic denoising and normalization, graph-based structural modeling, and a two-stage clustering strategy. APISENSOR is the first approach to achieve high robustness in automatic API discovery under mixed traffic, significantly improving both precision and stability while also uncovering inconsistencies in official API documentation. Evaluations on over 10,000 requests across six real-world applications demonstrate an average cluster purity of 95.92% and an F1-score of 94.91%, with the lowest performance variance, substantially outperforming ten baseline methods.
📝 Abstract
Large Language Model (LLM)-based agents increasingly rely on APIs to operate complex web applications, but rapid evolution often leads to incomplete or inconsistent API documentation. Existing work falls into two categories: (1) static, white-box approaches based on source code or formal specifications, and (2) dynamic, black-box approaches that infer APIs from runtime traffic. Static approaches rely on internal artifacts, which are typically unavailable for closed-source systems, and often over-approximate API usage, resulting in high false-positive rates. Although dynamic black-box API discovery applies broadly, its robustness degrades in complex environments where shared collection points aggregate traffic from multiple applications. To improve robustness under mixed runtime traffic, we propose APISENSOR, a black-box API discovery framework that reconstructs application APIs unsupervised. APISENSOR performs structured analysis over complex traffic, combining traffic denoising and normalization with a graph-based two-stage clustering process to recover accurate APIs. We evaluated APISENSOR across six web applications using over 10,000 runtime requests with simulated mixed-traffic noise. Results demonstrate that APISENSOR significantly improves discovery accuracy, achieving an average Group Accuracy Precision of 95.92% and an F1-score of 94.91%, outperforming state-of-the-art methods. Across different applications and noise settings, APISENSOR achieves the lowest performance variance and at most an 8.11-point FGA drop, demonstrating the best robustness among 10 baselines. Ablation studies confirm that each component is essential. Furthermore, APISENSOR revealed API documentation inconsistencies in a real application, later confirmed by community developers.