YASA: Scalable Multi-Language Taint Analysis on the Unified AST at Ant Group

📅 2026-01-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes a multilingual static taint analysis framework based on a Unified Abstract Syntax Tree (UAST), addressing the limitations of existing tools that are typically confined to single programming languages and struggle to balance scalability and precision in heterogeneous enterprise environments. By integrating both language-agnostic and language-specific semantic modeling, the framework enables high-precision pointer analysis and cross-language taint propagation. Evaluated on standard benchmarks, it outperforms eight state-of-the-art analyzers. In real-world deployment at Ant Group, the system scanned over 100 million lines of code, uncovering 314 previously unknown taint paths and confirming 92 zero-day vulnerabilities—76 of which have already been patched—demonstrating significant improvements in both accuracy and scalability for industrial-scale, multilingual code security analysis.

Technology Category

Application Category

📝 Abstract
Modern enterprises increasingly adopt diverse technology stacks with various programming languages, posing significant challenges for static application security testing (SAST). Existing taint analysis tools are predominantly designed for single languages, requiring substantial engineering effort that scales with language diversity. While multi-language tools like CodeQL, Joern, and WALA attempt to address these challenges, they face limitations in intermediate representation design, analysis precision, and extensibility, which make them difficult to scale effectively for large-scale industrial applications at Ant Group. To bridge this gap, we present YASA (Yet Another Static Analyzer), a unified multi-language static taint analysis framework designed for industrial-scale deployment. Specifically, YASA introduces the Unified Abstract Syntax Tree (UAST) that provides a unified abstraction for compatibility across diverse programming languages. Building on the UAST, YASA performs point-to analysis and taint propagation, leveraging a unified semantic model to manage language-agnostic constructs, while incorporating language-specific semantic models to handle other unique language features. When compared to 6 single- and 2 multi-language static analyzers on an industry-standard benchmark, YASA consistently outperformed all baselines across Java, JavaScript, Python, and Go. In real-world deployment within Ant Group, YASA analyzed over 100 million lines of code across 7.3K internal applications. It identified 314 previously unknown taint paths, with 92 of them confirmed as 0-day vulnerabilities. All vulnerabilities were responsibly reported, with 76 already patched by internal development teams, demonstrating YASA's practical effectiveness for securing large-scale industrial software systems.
Problem

Research questions and friction points this paper is trying to address.

multi-language
taint analysis
static application security testing
scalability
industrial-scale
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified Abstract Syntax Tree
multi-language taint analysis
static application security testing
scalable static analysis
language-agnostic semantic model
🔎 Similar Papers
No similar papers found.
Y
Yayi Wang
Ant Group
S
Shenao Wang
Huazhong University of Science and Technology
Jian Zhao
Jian Zhao
Dalian University of Technology
Mechanical sensingmultistable mechanismsinertial sensorsdynamics
S
Shaosen Shi
Ant Group
T
Ting Li
Ant Group
Y
Yan Cheng
Ant Group
L
Lizhong Bian
Ant Group
Kan Yu
Kan Yu
La Trobe University
Internet of ThingsIndustrial Wireless NetworkMachine LearningBlockchain
Yanjie Zhao
Yanjie Zhao
Huazhong University of Science and Technology
Software EngineeringSoftware Security
H
Haoyu Wang
Huazhong University of Science and Technology