Harnessing the Power of LLM to Support Binary Taint Analysis

📅 2023-10-12

🏛️ arXiv.org

📈 Citations: 13

✨ Influential: 0

career value

158K/year

🤖 AI Summary

Static binary taint analysis suffers from heavy reliance on manually crafted rules, poor generalizability, and high engineering overhead. To address these limitations, this paper proposes LATTE—the first fully automated static binary taint analysis framework powered by large language models (LLMs). LATTE uniquely integrates LLMs deeply into both taint propagation modeling and semantic understanding of low-level code, enabling end-to-end automation via prompt engineering—eliminating the need for hand-written propagation or detection rules. Evaluated on real-world embedded firmware, LATTE discovers 37 previously unknown vulnerabilities (including 7 CVEs), outperforming state-of-the-art tools—including Emtaint, Arbiter, and Karonte—in both vulnerability detection rate and precision. Moreover, it substantially reduces manual effort and analysis cost. LATTE establishes a novel paradigm for leveraging LLMs in low-level program security analysis, marking a significant advance in automating binary-level taint tracking.

📝 Abstract

This paper proposes LATTE, the first static binary taint analysis that is powered by a large language model (LLM). LATTE is superior to the state of the art (e.g., Emtaint, Arbiter, Karonte) in three aspects. First, LATTE is fully automated while prior static binary taint analyzers need rely on human expertise to manually customize taint propagation rules and vulnerability inspection rules. Second, LATTE is significantly effective in vulnerability detection, demonstrated by our comprehensive evaluations. For example, LATTE has found 37 new bugs in real-world firmware which the baselines failed to find, and 7 of them have been assigned CVE numbers. Lastly, LATTE incurs remarkably low engineering cost, making it a cost-efficient and scalable solution for security researchers and practitioners. We strongly believe that LATTE opens up a new direction to harness the recent advance in LLMs to improve vulnerability analysis for binary programs.

Problem

Research questions and friction points this paper is trying to address.

Software Security

Language Models

Pollution Detection

Innovation

Methods, ideas, or system contributions that make the work stand out.

LATTE

Large Language Models

Software Pollution Detection

🔎 Similar Papers

FoC: Figure out the Cryptographic Functions in Stripped Binaries with LLMs