🤖 AI Summary
Existing neural disassemblers frequently generate syntactically invalid outputs violating program-structure constraints—particularly post-dominance relations—severely undermining practical utility. This paper formally encodes post-dominance as an optimizable structural constraint and proposes a structure-aware Transformer architecture with a constraint-driven two-stage decoding framework: a front-end jointly models instruction sequences and post-dominance graphs, while a back-end enforces constraint satisfaction via structure-aware decoding and lightweight post-processing. The method achieves state-of-the-art accuracy while guaranteeing 100% structural legality (zero constraint violations) across diverse binary formats, with bounded inference latency suitable for real-world reverse engineering. Key contributions include: (i) the first formalization of post-dominance as a differentiable structural constraint for neural disassembly; (ii) a globally–locally coordinated neural architecture integrating control-flow semantics into sequence modeling; and (iii) the first end-to-end verifiable disassembler explicitly designed for post-dominance compliance.
📝 Abstract
Disassembly is a crucial yet challenging step in binary analysis. While emerging neural disassemblers show promise for efficiency and accuracy, they frequently generate outputs violating fundamental structural constraints, which significantly compromise their practical usability. To address this critical problem, we regularize the disassembly solution space by formalizing and applying key structural constraints based on post-dominance relations. This approach systematically detects widespread errors in existing neural disassemblers' outputs. These errors often originate from models' limited context modeling and instruction-level decoding that neglect global structural integrity. We introduce Tady, a novel neural disassembler featuring an improved model architecture and a dedicated post-processing algorithm, specifically engineered to address these deficiencies. Comprehensive evaluations on diverse binaries demonstrate that Tady effectively eliminates structural constraint violations and functions with high efficiency, while maintaining instruction-level accuracy.