🤖 AI Summary
This work addresses the trustworthiness challenge in pointer analysis for binary programs. We propose the first verifiable pointer analysis framework grounded in a generic abstract domain semantics. Our method formally models abstract interpretation, context-sensitive interprocedural analysis, and a tunable-precision abstract domain (high/medium/low) within Coq, enabling end-to-end mechanized correctness proofs that guarantee strict over-approximation of the concrete semantics. Applied to real-world binary executables, it successfully infers provenance of memory write operations and verifies critical safety properties—including stack frame integrity. Unlike prior approaches, our framework is the first to simultaneously achieve formal correctness guarantees, precision controllability, and practical scalability at the binary level. It thus establishes a trustworthy foundation for downstream applications such as symbolic execution, testing, formal verification, and decompilation.
📝 Abstract
Binary-level pointer analysis can be of use in symbolic execution, testing, verification, and decompilation of software binaries. In various such contexts, it is crucial that the result is trustworthy, i.e., it can be formally established that the pointer designations are overapproximative. This paper presents an approach to formally proven correct binary-level pointer analysis. A salient property of our approach is that it first generically considers what proof obligations a generic abstract domain for pointer analysis must satisfy. This allows easy instantiation of different domains, varying in precision, while preserving the correctness of the analysis. In the trade-off between scalability and precision, such customization allows"meaningful"precision (sufficiently precise to ensure basic sanity properties, such as that relevant parts of the stack frame are not overwritten during function execution) while also allowing coarse analysis when pointer computations have become too obfuscated during compilation for sound and accurate bounds analysis. We experiment with three different abstract domains with high, medium, and low precision. Evaluation shows that our approach is able to derive designations for memory writes soundly in COTS binaries, in a context-sensitive interprocedural fashion.