🤖 AI Summary
This work addresses the significant challenges posed by the massive scale, fragmented nature, and lack of structured temporal context in social media data for unsupervised event detection and evolution modeling. The authors propose RagSEDE, a novel framework that first selects salient messages through representative and diverse sampling, then leverages Retrieval-Augmented Generation (RAG) to enhance event detection accuracy. Crucially, RagSEDE introduces structural entropy for the first time to dynamically characterize evolving event-related keywords, enabling the construction of a continuously updated event knowledge base. By integrating RAG, pre-trained language models, and structural information theory, the method establishes a new paradigm for the co-evolution of unsupervised event detection and knowledge base maintenance. Experiments on two public datasets demonstrate its substantial superiority over existing approaches.
📝 Abstract
With the growing scale of social media, social event detection and evolution modeling have attracted increasing attention. Graph neural networks (GNNs) and transformer-based pre-trained language models (PLMs) have become mainstream approaches in this area. However, existing methods still face three major challenges. First, the sheer volume of social media messages makes learning resource-intensive. Second, the fragmentation of social media messages often impedes the model's ability to capture a comprehensive view of the events. Third, the lack of structured temporal context has hindered the development of effective models for event evolution, limiting users'access to event information. To address these challenges, we propose a foundation model for unsupervised Social Event Detection and Evolution, namely RagSEDE. Specifically, RagSEDE introduces a representativeness- and diversity-driven sampling strategy to extract key messages from massive social streams, significantly reducing noise and computational overhead. It further establishes a novel paradigm based on Retrieval Augmented Generation (RAG) that enhances PLMs in detecting events while simultaneously constructing and maintaining an evolving event knowledge base. Finally, RagSEDE leverages structural information theory to dynamically model event evolution keywords for the first time. Extensive experiments on two public datasets demonstrate the superiority of RagSEDE in open-world social event detection and evolution.