Adding Compilation Metadata To Binaries To Make Disassembly Decidable

📅 2026-04-21
📈 Citations: 0
Influential: 0
📄 PDF

career value

204K/year
🤖 AI Summary
This work addresses the unreliability of disassembly caused by the absence of compiler-intended semantic information in stripped binary executables. To overcome this limitation, the authors propose a novel lightweight metadata embedding mechanism that explicitly encodes critical semantics—such as code regions and memory boundaries—directly into the binary. This approach yields a decidable intermediate representation situated between raw binaries and source code. For the first time, it enables disassembly that is both decidable and recompilable, facilitating precise lifting to high-level intermediate representations. Experimental evaluation demonstrates that the embedded metadata incurs only 17% of the size overhead of DWARF debug information, introduces no runtime performance penalty, and successfully supports behavior-preserving binary lifting, instrumentation, and recompilation across a wide range of real-world C/C++ programs.

Technology Category

Application Category

📝 Abstract
The binary executable format is the standard method for distributing and executing software. Yet, it is also as opaque a representation of software as can be. If the binary format were augmented with metadata that provides security-relevant information, such as which data is intended by the compiler to be executable instructions, or how memory regions are expected to be bounded, that would dramatically improve the safety and maintainability of software. In this paper, we propose a binary format that is a middle ground between a stripped black-box binary and open source. We provide a tool that generates metadata capturing the compiler's intent and inserts it into the binary. This metadata enables lifting to a correct and recompilable higher-level representation and makes analysis and instrumentation more reliable. Our evaluation shows that adding metadata does not affect runtime behavior or performance. Compared to DWARF, our metadata is roughly 17% of its size. We validate correctness by compiling a comprehensive set of real-world C and C++ binaries and demonstrating that they can be lifted, instrumented, and recompiled without altering their behavior.
Problem

Research questions and friction points this paper is trying to address.

binary format
compilation metadata
disassembly
program analysis
software safety
Innovation

Methods, ideas, or system contributions that make the work stand out.

compilation metadata
decidable disassembly
binary lifting
recompilable binary
program analysis
🔎 Similar Papers
No similar papers found.