Accurate and Extensible Symbolic Execution of Binary Code based on Formal ISA Semantics

📅 2024-04-05

📈 Citations: 1

✨ Influential: 0

career value

176K/year

🤖 AI Summary

Binary program symbolic execution suffers from semantic distortion and implementation errors introduced during intermediate representation (IR) translation. Method: This paper proposes the first instruction-level symbolic execution framework directly grounded in formal ISA semantics (Rock/Sail), bypassing conventional IR abstractions by compiling machine-readable ISA specifications into SMT-solvable symbolic semantic models and integrating them into a binary analysis platform. Contributions/Results: (1) The first end-to-end automated pipeline from formal ISA semantics to symbolic execution; (2) Demonstrated scalability on RISC-V—modeling new instructions requires only a few hours; (3) Discovered five previously unknown ISA semantic implementation bugs in angr; (4) Achieved high-fidelity branch modeling and solving capability. The framework significantly improves the accuracy, trustworthiness, and development efficiency of binary symbolic execution.

Technology Category

Application Category

📝 Abstract

Symbolic execution is an SMT-based software verification and testing technique. Symbolic execution requires tracking performed computations during software simulation to reason about branches in the software under test. The prevailing approach on symbolic execution of binary code tracks computations by transforming the code to be tested to an architecture-independent IR and then symbolically executes this IR. However, the resulting IR must be semantically equivalent to the binary code, making this process complex and error-prone. The semantics of the binary code are specified by the targeted ISA, commonly given in natural language and requiring a manual implementation of the transformation to an IR. In recent years, the use of formal languages to describe ISA semantics in a machine-readable way has gained increased popularity. We investigate the utilization of such formal semantics for symbolic execution of binary code, achieving an accurate representation of instruction semantics. We present a prototype for the RISC-V ISA and conduct a case study to demonstrate that it can be easily extended to additional instructions. Furthermore, we perform an experimental comparison with prior work which resulted in the discovery of five previously unknown bugs in the ISA implementation of the popular IR-based symbolic executor angr.

Problem

Research questions and friction points this paper is trying to address.

Symbolic Execution

Machine Language

Software Verification

Innovation

Methods, ideas, or system contributions that make the work stand out.

Symbolic Execution

Instruction Set Architecture (ISA)

Bug Detection

🔎 Similar Papers

Macaw: A Machine Code Toolbox for the Busy Binary Analyst