🤖 AI Summary
Existing Python security analysis tools exhibit limited coverage of vulnerability types and struggle to address diverse threats. This work proposes a static analysis framework based on symbolic control-flow graphs, integrated with a security-property-oriented domain-specific language (DSL) that enables flexible definition and composition of multiple security properties, such as integrity and confidentiality. The approach efficiently detects and precisely localizes a wide range of security vulnerabilities, achieving 100% sensitivity and 99.15% specificity (with only one false positive) on a benchmark of 108 real-world vulnerability cases. Analysis completes in under 31 seconds per case, yielding a 2.5× to 512× speedup over state-of-the-art tools, thereby significantly enhancing the generality, precision, and efficiency of vulnerability detection.
📝 Abstract
Python is one of the most popular programming languages; as such, projects written in Python involve an increasing number of diverse security vulnerabilities. However, existing state-of-the-art analysis tools for Python only support a few vulnerability types. Hence, there is a need to detect a large variety of vulnerabilities in Python projects. In this paper, we propose the SAGA approach to detect and locate vulnerabilities in Python source code in a versatile way. SAGA includes a source code parser able to extract control- and data-flow information and to represent it as a symbolic control-flow graph, as well as a domain-specific language defining static aspects of the source code and their evolution during graph traversals. We have leveraged this language to define a library of static aspects for integrity, confidentiality, and other security-related properties. We have evaluated SAGA on a dataset of 108 vulnerabilities, obtaining 100% sensitivity and 99.15% specificity, with only one false positive, while outperforming four common security analysis tools. This analysis was performed in less than 31 seconds, i.e., between 2.5 and 512.1 times faster than the baseline tools.