🤖 AI Summary
This work proposes AFGNN, a novel framework designed to detect defects and security vulnerabilities in Java code caused by API misuse. The approach introduces the API Flow Graph (AFG), a new representation that integrates the execution order of API calls with data flow and control flow information. By combining graph neural networks, self-supervised pre-training, and clustering algorithms, AFGNN effectively identifies both known and previously unseen anomalous API usage patterns. Experimental results demonstrate that AFGNN significantly outperforms existing small language models and specialized detection tools on mainstream API usage datasets, exhibiting superior generalization capability and detection accuracy.
📝 Abstract
Application Programming Interfaces (APIs) are crucial to software development, enabling integration of existing systems with new applications by reusing tried and tested code, saving development time and increasing software safety. In particular, the Java standard library APIs, along with numerous third-party APIs, are extensively utilized in the development of enterprise application software. However, their misuse remains a significant source of bugs and vulnerabilities. Furthermore, due to the limited examples in the official API documentation, developers often rely on online portals and generative AI models to learn unfamiliar APIs, but using such examples may introduce unintentional errors in the software. In this paper, we present AFGNN, a novel Graph Neural Network (GNN)-based framework for efficiently detecting API misuses in Java code. AFGNN uses a novel API Flow Graph (AFG) representation that captures the API execution sequence, data, and control flow information present in the code to model the API usage patterns. AFGNN uses self-supervised pre-training with AFG representation to effectively compute the embeddings for unknown API usage examples and cluster them to identify different usage patterns. Experiments on popular API usage datasets show that AFGNN significantly outperforms state-of-the-art small language models and API misuse detectors.