Scraping the Shadows: Deep Learning Breakthroughs in Dark Web Intelligence

📅 2025-04-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the high-risk, inefficient, and error-prone nature of manual intelligence collection from Darknet Markets (DNMs), this paper proposes the first automated entity extraction framework specifically designed for DNMs. Methodologically: (1) we construct the first DNM-specific named entity recognition (NER) annotated dataset; (2) we systematically evaluate and optimize three state-of-the-art NER models—ELMo-BiLSTM, UniversalNER, and GLiNER—through domain-adaptive fine-tuning and structured web crawling. Our contributions include the first robust entity recognition system tailored to DNMs, achieving 91% precision, 96% recall, and 94% F1-score with UniversalNER—significantly outperforming baseline approaches. The released dataset, open-source framework, and empirical analysis provide law enforcement agencies with reusable technical infrastructure and methodological guidance for illicit activity monitoring.

Technology Category

Application Category

📝 Abstract
Darknet markets (DNMs) facilitate the trade of illegal goods on a global scale. Gathering data on DNMs is critical to ensuring law enforcement agencies can effectively combat crime. Manually extracting data from DNMs is an error-prone and time-consuming task. Aiming to automate this process we develop a framework for extracting data from DNMs and evaluate the application of three state-of-the-art Named Entity Recognition (NER) models, ELMo-BiLSTM citep{ShahEtAl2022}, UniversalNER citep{ZhouEtAl2024}, and GLiNER citep{ZaratianaEtAl2023}, at the task of extracting complex entities from DNM product listing pages. We propose a new annotated dataset, which we use to train, fine-tune, and evaluate the models. Our findings show that state-of-the-art NER models perform well in information extraction from DNMs, achieving 91% Precision, 96% Recall, and an F1 score of 94%. In addition, fine-tuning enhances model performance, with UniversalNER achieving the best performance.
Problem

Research questions and friction points this paper is trying to address.

Automating data extraction from darknet markets for law enforcement
Evaluating NER models for complex entity extraction efficiency
Creating annotated dataset to improve model performance accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated data extraction from darknet markets
Applied three advanced NER models
Created annotated dataset for training
🔎 Similar Papers
No similar papers found.
I
Ingmar Bakermans
Jheronimus Academy of Data Science, 's-Hertogenbosch, The Netherlands
D
Daniel De Pascale
Jheronimus Academy of Data Science, 's-Hertogenbosch, The Netherlands
G
Gonçalo Marcelino
University of Amsterdam, Amsterdam, The Netherlands
Giuseppe Cascavilla
Giuseppe Cascavilla
Tilburg University
CyberThreatIntelligenceCybersecurityBig Data AnalysisOSINTIoT
Z
Zeno Geradts
University of Amsterdam, Amsterdam, The Netherlands