BLens: Contrastive Captioning of Binary Functions using Ensemble Embedding

๐Ÿ“… 2024-09-12
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address weak cross-project generalization and poor interpretability in binary function name prediction, this paper pioneers the adoption of image captioning paradigms in reverse engineering, proposing a function naming method based on multimodal embedding alignment and a lightweight Transformer decoder. Our key contributions are: (1) a unified multimodal embedding representation integrating control-flow graphs and semantic features of binary functions; (2) a contrastive learningโ€“driven alignment mechanism between function embeddings and natural-language name spaces; and (3) a customized lightweight Transformer decoder specifically designed for function name generation. Experiments demonstrate state-of-the-art performance: an Fโ‚-score of 0.77 on in-distribution binary tasks (+0.10 over prior SOTA) and 0.46 on cross-project evaluation (+0.17 over SOTA), confirming substantial improvements in both generalization capability and model interpretability.

Technology Category

Application Category

๐Ÿ“ Abstract
Function names can greatly aid human reverse engineers, which has spurred development of machine learning-based approaches to predicting function names in stripped binaries. Much current work in this area now uses transformers, applying a metaphor of machine translation from code to function names. Still, function naming models face challenges in generalizing to projects completely unrelated to the training set. In this paper, we take a completely new approach by transferring advances in automated image captioning to the domain of binary reverse engineering, such that different parts of a binary function can be associated with parts of its name. We propose BLens, which combines multiple binary function embeddings into a new ensemble representation, aligns it with the name representation latent space via a contrastive learning approach, and generates function names with a transformer architecture tailored for function names. In our experiments, we demonstrate that BLens significantly outperforms the state of the art. In the usual setting of splitting per binary, we achieve an $F_1$ score of 0.77 compared to 0.67. Moreover, in the cross-project setting, which emphasizes generalizability, we achieve an $F_1$ score of 0.46 compared to 0.29.
Problem

Research questions and friction points this paper is trying to address.

Machine Learning
Binary Code Analysis
Function Naming
Innovation

Methods, ideas, or system contributions that make the work stand out.

Binary Function Analysis
Machine Learning Enhancement
Name Prediction Accuracy
๐Ÿ”Ž Similar Papers
No similar papers found.
T
Tristan Benoit
LMU Munich, Germany; Bundeswehr University Munich, Germany
Y
Yunru Wang
LMU Munich, Germany
M
Moritz Dannehl
LMU Munich, Germany
J
Johannes Kinder
LMU Munich, Germany