Revisiting Third-Party Library Detection: A Ground Truth Dataset and Its Implications Across Security Tasks

📅 2025-09-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Accurate detection of Android third-party libraries (TPLs) is critical for vulnerability tracking, malware analysis, and supply-chain auditing; however, the real-world effectiveness of existing TPL detection tools remains poorly understood. Method: We construct the first large-scale, manually curated benchmark dataset—featuring precise version-level annotations for both remote and local dependencies—and systematically evaluate 10 state-of-the-art TPL detectors across R8 obfuscation robustness, version identification accuracy, and scalability. Our empirical study analyzes over 6,000 Android apps using static analysis, similarity matching, and version-aware dependency resolution. Contribution/Results: We uncover pervasive limitations—including high obfuscation sensitivity, frequent version misidentification, and prohibitive resource overhead—and quantify their detrimental impact on downstream security tasks. Crucially, we establish interpretable, empirically grounded links between TPL characteristics (e.g., packaging style, obfuscation resilience) and detector performance, providing a foundational benchmark and actionable insights for developing robust, fine-grained TPL detection methods.

Technology Category

Application Category

📝 Abstract
Accurate detection of third-party libraries (TPLs) is fundamental to Android security, supporting vulnerability tracking, malware detection, and supply chain auditing. Despite many proposed tools, their real-world effectiveness remains unclear.We present the first large-scale empirical study of ten state-of-the-art TPL detection techniques across over 6,000 apps, enabled by a new ground truth dataset with precise version-level annotations for both remote and local dependencies.Our evaluation exposes tool fragility to R8-era transformations, weak version discrimination, inaccurate correspondence of candidate libraries, difficulty in generalizing similarity thresholds, and prohibitive runtime/memory overheads at scale.Beyond tool assessment, we further analyze how TPLs shape downstream tasks, including vulnerability analysis, malware detection, secret leakage assessment, and LLM-based evaluation. From this perspective, our study provides concrete insights into how TPL characteristics affect these tasks and informs future improvements in security analysis.
Problem

Research questions and friction points this paper is trying to address.

Evaluating effectiveness of third-party library detection tools
Assessing tool performance across security analysis tasks
Analyzing impact of TPL characteristics on security outcomes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Ground truth dataset with version-level annotations
Empirical study of ten TPL detection techniques
Analysis of TPL impact on security tasks
🔎 Similar Papers
No similar papers found.
J
Jintao Gu
Beijing University of Posts and Telecommunications, China
H
Haolang Lu
Beijing University of Posts and Telecommunications, China
Guoshun Nan
Guoshun Nan
Professor of Beijing University of Posts and Telecommunications
Multimodal LearningVideo LLM6G SecuritySemantic Communications
Yihan Lin
Yihan Lin
Assistant Professor, Xiamen University
Brain inspired VisionDeep learningNeuromorphic engineeringComplex networks
K
Kun Wang
Nanyang Technological University, Singapore
Yuchun Guo
Yuchun Guo
Research Scientist, CSAIL, MIT
Computational BiologyMachine LearningRegulatory GenomicsEpigenomicsTranscriptional Regulation
Y
Yigui Cao
Beijing University of Posts and Telecommunications, China
Y
Yang Liu
Nanyang Technological University, Singapore