Superset Decompilation

📅 2026-03-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional decompiler architectures suffer from rigidity, struggling to balance soundness and precision while prematurely resolving ambiguities. This work proposes a provenance-guided superset decompilation (PGSD) framework that treats decompilation as the inverse of compilation, employing a modular pipeline where relational, monotonic Datalog rules progressively raise the level of abstraction. By preserving parallel candidate interpretations until final decision points, PGSD avoids early information loss. The authors implement this approach in Manifold, a system built in Rust and Datalog that constructs a fine-grained intermediate representation and successfully lifts Linux ELF binaries to C99 code. Evaluations on GNU coreutils demonstrate that Manifold produces output quality comparable to Ghidra, IDA Pro, angr, and RetDec, with fewer compilation errors and robust generalization across different compilers and optimization levels.
📝 Abstract
Reverse engineering tools remain monolithic and imperative compared to the advancement of modern compiler architectures: analyses are tied to a single mutable representation, making them difficult to extend or refine, and forcing premature choices between soundness and precision. We observe that decompilation is the reverse of compilation and can be structured as a sequence of modular passes, each performing a granular and clearly defined interpretation of the binary at a progressively higher level of abstraction. We formalize this as provenance-guided superset decompilation (PGSD), a framework that monotonically derives facts about the binary into a relation store. Instead of committing early to a single interpretation, the pipeline retains ambiguous interpretations as parallel candidates with provenance, deferring resolution until the final selection phase. Manifold implements PGSD as a declarative reverse engineering framework that lifts Linux ELF binaries to C99 through a granular intermediate representation in ~35K lines of Rust and Datalog. On GNU coreutils, Manifold's output quality matches Ghidra, IDA Pro, angr, and RetDec on multiple metrics while producing fewer compiler errors, and generalizes across compilers and optimization levels.
Problem

Research questions and friction points this paper is trying to address.

decompilation
reverse engineering
modular analysis
soundness vs precision
binary lifting
Innovation

Methods, ideas, or system contributions that make the work stand out.

superset decompilation
provenance-guided
modular reverse engineering
declarative framework
intermediate representation
🔎 Similar Papers
No similar papers found.