🤖 AI Summary
Static binary taint analysis suffers from heavy reliance on manually crafted rules, poor generalizability, and high engineering overhead. To address these limitations, this paper proposes LATTE—the first fully automated static binary taint analysis framework powered by large language models (LLMs). LATTE uniquely integrates LLMs deeply into both taint propagation modeling and semantic understanding of low-level code, enabling end-to-end automation via prompt engineering—eliminating the need for hand-written propagation or detection rules. Evaluated on real-world embedded firmware, LATTE discovers 37 previously unknown vulnerabilities (including 7 CVEs), outperforming state-of-the-art tools—including Emtaint, Arbiter, and Karonte—in both vulnerability detection rate and precision. Moreover, it substantially reduces manual effort and analysis cost. LATTE establishes a novel paradigm for leveraging LLMs in low-level program security analysis, marking a significant advance in automating binary-level taint tracking.
📝 Abstract
This paper proposes LATTE, the first static binary taint analysis that is powered by a large language model (LLM). LATTE is superior to the state of the art (e.g., Emtaint, Arbiter, Karonte) in three aspects. First, LATTE is fully automated while prior static binary taint analyzers need rely on human expertise to manually customize taint propagation rules and vulnerability inspection rules. Second, LATTE is significantly effective in vulnerability detection, demonstrated by our comprehensive evaluations. For example, LATTE has found 37 new bugs in real-world firmware which the baselines failed to find, and 7 of them have been assigned CVE numbers. Lastly, LATTE incurs remarkably low engineering cost, making it a cost-efficient and scalable solution for security researchers and practitioners. We strongly believe that LATTE opens up a new direction to harness the recent advance in LLMs to improve vulnerability analysis for binary programs.