BinCoFer: Three-Stage Purification for Effective C/C++ Binary Third-Party Library Detection

📅 2025-04-28

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

This paper addresses the challenge of third-party library (TPL) detection in C/C++ binary programs—characterized by unavailable source code, redundant feature databases, and frequent partial-reference omissions—by proposing a source-code-agnostic, binary-level TPL identification framework. Methodologically, it introduces a novel three-stage feature purification strategy that focuses on core functions and extracts robust, function-level features; additionally, it designs a weighted similarity aggregation mechanism to replace rigid static thresholds, thereby significantly improving discriminative capability for partially linked TPLs. Evaluated on the ArchLinux dataset, the approach achieves an average 12.7% higher recall than ModX, B2SFinder, LibAM, and BinaryAI, and attains a 91.4% F1-score for partially referenced TPLs, demonstrating both high precision and computational efficiency.

Technology Category

Application Category

📝 Abstract

Third-party libraries (TPL) are becoming increasingly popular to achieve efficient and concise software development. However, unregulated use of TPL will introduce legal and security issues in software development. Consequently, some studies have attempted to detect the reuse of TPLs in target programs by constructing a feature repository. Most of the works require access to the source code of TPLs, while the others suffer from redundancy in the repository, low detection efficiency, and difficulties in detecting partially referenced third-party libraries. Therefore, we introduce BinCoFer, a tool designed for detecting TPLs reused in binary programs. We leverage the work of binary code similarity detection(BCSD) to extract binary-format TPL features, making it suitable for scenarios where the source code of TPLs is inaccessible. BinCoFer employs a novel three-stage purification strategy to mitigate feature repository redundancy by highlighting core functions and extracting function-level features, making it applicable to scenarios of partial reuse of TPLs. We have observed that directly using similarity threshold to determine the reuse between two binary functions is inaccurate, a problem that previous work has not addressed. Thus we design a method that uses weight to aggregate the similarity between functions in the target binary and core functions to ultimately judge the reuse situation with high frequency. To examine the ability of BinCoFer, we compiled a dataset on ArchLinux and conduct comparative experiments on it with other four most related works (i.e., ModX, B2SFinder, LibAM and BinaryAI)...

Problem

Research questions and friction points this paper is trying to address.

Detects third-party library reuse in binary programs

Reduces feature repository redundancy via purification

Improves accuracy in partial library reuse detection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Binary code similarity detection for TPL feature extraction

Three-stage purification strategy to reduce redundancy

Weight-based similarity aggregation for accurate reuse judgment

🔎 Similar Papers

No similar papers found.