🤖 AI Summary
To address the high-risk, inefficient, and error-prone nature of manual intelligence collection from Darknet Markets (DNMs), this paper proposes the first automated entity extraction framework specifically designed for DNMs. Methodologically: (1) we construct the first DNM-specific named entity recognition (NER) annotated dataset; (2) we systematically evaluate and optimize three state-of-the-art NER models—ELMo-BiLSTM, UniversalNER, and GLiNER—through domain-adaptive fine-tuning and structured web crawling. Our contributions include the first robust entity recognition system tailored to DNMs, achieving 91% precision, 96% recall, and 94% F1-score with UniversalNER—significantly outperforming baseline approaches. The released dataset, open-source framework, and empirical analysis provide law enforcement agencies with reusable technical infrastructure and methodological guidance for illicit activity monitoring.
📝 Abstract
Darknet markets (DNMs) facilitate the trade of illegal goods on a global scale. Gathering data on DNMs is critical to ensuring law enforcement agencies can effectively combat crime. Manually extracting data from DNMs is an error-prone and time-consuming task. Aiming to automate this process we develop a framework for extracting data from DNMs and evaluate the application of three state-of-the-art Named Entity Recognition (NER) models, ELMo-BiLSTM citep{ShahEtAl2022}, UniversalNER citep{ZhouEtAl2024}, and GLiNER citep{ZaratianaEtAl2023}, at the task of extracting complex entities from DNM product listing pages. We propose a new annotated dataset, which we use to train, fine-tune, and evaluate the models. Our findings show that state-of-the-art NER models perform well in information extraction from DNMs, achieving 91% Precision, 96% Recall, and an F1 score of 94%. In addition, fine-tuning enhances model performance, with UniversalNER achieving the best performance.