Combining Knowledge Graphs and NLP to Analyze Instant Messaging Data in Criminal Investigations

📅 2025-09-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In criminal investigations, manual analysis of instant messaging data (e.g., WhatsApp) incurs high labor costs and suffers from low efficiency in evidence triage. To address this, we propose a knowledge graph–enhanced multimodal NLP framework that unifies semantic representations across message text, speech (via ASR), and metadata through end-to-end named entity recognition, automatic speech recognition, and dynamic knowledge graph construction. The framework enables auditable graph queries grounded in original evidentiary artifacts, interactive visual exploration, and semantics-aware search. A prototype system implementing this approach has been deployed in real-world investigations, demonstrating significant improvements in identifying key persons, events, and relational patterns—validated by frontline prosecutors. This work advances explainable, verifiable AI for digital forensics, bridging the gap between interpretable reasoning and operational legal requirements.

Technology Category

Application Category

📝 Abstract
Criminal investigations often involve the analysis of messages exchanged through instant messaging apps such as WhatsApp, which can be an extremely effort-consuming task. Our approach integrates knowledge graphs and NLP models to support this analysis by semantically enriching data collected from suspects' mobile phones, and help prosecutors and investigators search into the data and get valuable insights. Our semantic enrichment process involves extracting message data and modeling it using a knowledge graph, generating transcriptions of voice messages, and annotating the data using an end-to-end entity extraction approach. We adopt two different solutions to help users get insights into the data, one based on querying and visualizing the graph, and one based on semantic search. The proposed approach ensures that users can verify the information by accessing the original data. While we report about early results and prototypes developed in the context of an ongoing project, our proposal has undergone practical applications with real investigation data. As a consequence, we had the chance to interact closely with prosecutors, collecting positive feedback but also identifying interesting opportunities as well as promising research directions to share with the research community.
Problem

Research questions and friction points this paper is trying to address.

Analyzing instant messaging data in criminal investigations is labor-intensive
Integrating knowledge graphs and NLP to semantically enrich message data
Enabling investigators to search and verify insights from mobile evidence
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrating knowledge graphs with NLP models
Semantic enrichment via entity extraction and transcription
Dual query-visualization and semantic search solutions
🔎 Similar Papers
No similar papers found.
R
Riccardo Pozzi
University of Milano-Bicocca, Milan, Italy
V
Valentina Barbera
University of Milano-Bicocca, Milan, Italy
R
Renzo Alva Principe
University of Milano-Bicocca, Milan, Italy
D
Davide Giardini
University of Milano-Bicocca, Milan, Italy
R
Riccardo Rubini
University of Milano-Bicocca, Milan, Italy
Matteo Palmonari
Matteo Palmonari
Associate Professor, University of Milan-Bicocca
Artificial IntelligenceSemantic WebData Integration