Harnessing the Power of LLM to Support Binary Taint Analysis

📅 2023-10-12
🏛️ arXiv.org
📈 Citations: 13
Influential: 0
📄 PDF
🤖 AI Summary
Static binary taint analysis suffers from heavy reliance on manually crafted rules, poor generalizability, and high engineering overhead. To address these limitations, this paper proposes LATTE—the first fully automated static binary taint analysis framework powered by large language models (LLMs). LATTE uniquely integrates LLMs deeply into both taint propagation modeling and semantic understanding of low-level code, enabling end-to-end automation via prompt engineering—eliminating the need for hand-written propagation or detection rules. Evaluated on real-world embedded firmware, LATTE discovers 37 previously unknown vulnerabilities (including 7 CVEs), outperforming state-of-the-art tools—including Emtaint, Arbiter, and Karonte—in both vulnerability detection rate and precision. Moreover, it substantially reduces manual effort and analysis cost. LATTE establishes a novel paradigm for leveraging LLMs in low-level program security analysis, marking a significant advance in automating binary-level taint tracking.
📝 Abstract
This paper proposes LATTE, the first static binary taint analysis that is powered by a large language model (LLM). LATTE is superior to the state of the art (e.g., Emtaint, Arbiter, Karonte) in three aspects. First, LATTE is fully automated while prior static binary taint analyzers need rely on human expertise to manually customize taint propagation rules and vulnerability inspection rules. Second, LATTE is significantly effective in vulnerability detection, demonstrated by our comprehensive evaluations. For example, LATTE has found 37 new bugs in real-world firmware which the baselines failed to find, and 7 of them have been assigned CVE numbers. Lastly, LATTE incurs remarkably low engineering cost, making it a cost-efficient and scalable solution for security researchers and practitioners. We strongly believe that LATTE opens up a new direction to harness the recent advance in LLMs to improve vulnerability analysis for binary programs.
Problem

Research questions and friction points this paper is trying to address.

Software Security
Language Models
Pollution Detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

LATTE
Large Language Models
Software Pollution Detection
🔎 Similar Papers
No similar papers found.
P
Puzhuo Liu
Ant Group; Tsinghua University, China
Chengnian Sun
Chengnian Sun
Associate Professor of Computer Science, University of Waterloo
Software EngineeringProgramming Languages
Yaowen Zheng
Yaowen Zheng
Institute of Information Engineering, Chinese Academy of Sciences
System securityIoT Security
X
Xuan Feng
Independent Researcher, Canada
C
Chuan Qin
Institute of Information Engineering, CAS; University of Chinese Academy of Sciences, China
Y
Yuncheng Wang
Institute of Information Engineering, CAS; University of Chinese Academy of Sciences, China
Z
Zhi Li
Institute of Information Engineering, CAS; University of Chinese Academy of Sciences, China
L
Limin Sun
Institute of Information Engineering, CAS; University of Chinese Academy of Sciences, China