Online Density-Based Clustering for Real-Time Narrative Evolution Monitorin

📅 2026-01-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of traditional batch clustering algorithms—such as HDBSCAN—in meeting the demands of real-time, memory-efficient, and dynamically adaptive narrative monitoring on social media. To overcome these challenges, we propose a three-stage pipeline that employs online density-based clustering for multilingual document streams. We further introduce a comprehensive evaluation framework that integrates conventional clustering metrics (e.g., Silhouette coefficient, Davies–Bouldin index) with narrative-specific indicators, including narrative distinctiveness, contingency, and variance. Through sliding-window simulations, we systematically compare multiple incremental clustering algorithms and identify an optimal solution that balances clustering quality, computational efficiency, and memory footprint. The resulting approach effectively supports real-time tracking of narrative evolution and demonstrates strong practical viability for deployment in operational settings.

Technology Category

Application Category

📝 Abstract
Automated narrative intelligence systems for social media monitoring face significant scalability challenges when relying on batch clustering methods to process continuous data streams. We investigate replacing offline HDBSCAN with online density-based clustering algorithms in a production narrative report generation pipeline that processes large volumes of multilingual social media data. While HDBSCAN effectively discovers hierarchical clusters and handles noise, its batch-only nature requires full retraining for each time window, limiting scalability and real-time adaptability. We evaluate online clustering methods with respect to cluster quality, computational efficiency, memory footprint, and integration with downstream narrative extraction. Our evaluation combines standard clustering metrics, narrative-specific measures, and human validation of cluster correctness to assess both structural quality and semantic interpretability. Experiments using sliding-window simulations on historical data from the Ukrainian information space reveal trade-offs between temporal stability and narrative coherence, with DenStream achieving the strongest overall performance. These findings bridge the gap between batch-oriented clustering approaches and the streaming requirements of large-scale narrative monitoring systems.
Problem

Research questions and friction points this paper is trying to address.

online clustering
narrative evolution
social media monitoring
streaming data
scalability
Innovation

Methods, ideas, or system contributions that make the work stand out.

online clustering
density-based clustering
narrative evolution
streaming data
social media monitoring
🔎 Similar Papers
O
Ostap Vykhopen
V
Viktoria Skorik
M
Maxim Tereschenko
Veronika Solopova
Veronika Solopova
Technische Universität Berlin
Computational linguisticsEthics of AI