๐ค AI Summary
This work addresses the challenge of recovering composite data structure layouts and function signatures in stripped binary programs. To this end, the authors propose XTRIDE, the first method to jointly infer structures and function signatures using an n-gram model grounded in real-world type distributions. XTRIDE integrates an efficient training strategy, a confidence scoring mechanism, and a joint inference framework, enabling high-throughput analysis while providing actionable confidence estimates. Evaluated on the DIRT dataset, XTRIDE achieves a type inference accuracy of 90.15%, surpassing the state-of-the-art by 5.09 percentage points. It also accelerates structure reconstruction by 70ร to 2300ร and yields the highest proportion of completely correct layout reconstructions among existing approaches.
๐ Abstract
The recovery of types from stripped binaries is a key to exact decompilation, yet its practical realization suffers. For composite structures in particular, both layout and semantic fidelity are required to enable end-to-end reconstruction. Many existing approaches either synthesize layouts or infer names post-hoc, which weakens downstream usability. This is further aggravated by an excessive runtime overhead that is especially prohibitive in automated environments. We present XTRIDE, an improved n-gram-based approach that focuses on practicality: highly optimized throughput and actionable confidence scores allow for deployment in automated pipelines. When compared to the state of the art in struct recovery, our method achieves comparable performance while being between 70 and 2300 times faster. As our inference is grounded in real-world types, we achieve the highest ratio of fully-correct struct layouts. With an optimized training regimen, our model outperforms the current state of the art on the DIRT dataset by 5.09 percentage points, achieving 90.15% type inference accuracy overall. Furthermore, we show that n-gram-based type prediction generalizes to function signature recovery: conducting a case study on embedded firmware, we show that this efficient approach to function similarity can assist in typical reverse engineering tasks.