Explicit and Implicit Data Augmentation for Social Event Detection

📅 2025-09-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the high cost of manual annotation in social media event detection, this paper proposes SED-Aug, a dual-augmentation framework that synergistically integrates explicit text-space augmentation with implicit feature-space augmentation. Specifically, it designs five large language model–driven, semantics-preserving text reconstruction strategies and introduces five structure-aware feature perturbation techniques—including node masking and relation reweighting—to enhance fine-grained semantic robustness in the embedding space. Evaluated on Twitter2012 and Twitter2018, SED-Aug achieves average F1-score improvements of 17.67% and 15.57%, respectively, significantly outperforming state-of-the-art baselines. This work is the first to jointly leverage large-model generative capabilities and graph-structural feature perturbations, establishing a scalable, robust augmentation paradigm for low-resource event detection.

Technology Category

Application Category

📝 Abstract
Social event detection involves identifying and categorizing important events from social media, which relies on labeled data, but annotation is costly and labor-intensive. To address this problem, we propose Augmentation framework for Social Event Detection (SED-Aug), a plug-and-play dual augmentation framework, which combines explicit text-based and implicit feature-space augmentation to enhance data diversity and model robustness. The explicit augmentation utilizes large language models to enhance textual information through five diverse generation strategies. For implicit augmentation, we design five novel perturbation techniques that operate in the feature space on structural fused embeddings. These perturbations are crafted to keep the semantic and relational properties of the embeddings and make them more diverse. Specifically, SED-Aug outperforms the best baseline model by approximately 17.67% on the Twitter2012 dataset and by about 15.57% on the Twitter2018 dataset in terms of the average F1 score. The code is available at GitHub: https://github.com/congboma/SED-Aug.
Problem

Research questions and friction points this paper is trying to address.

Addressing costly social event annotation via data augmentation
Combining explicit and implicit methods to enhance data diversity
Improving model robustness for social media event detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines explicit and implicit data augmentation techniques
Uses large language models for text generation strategies
Applies feature-space perturbations on structural embeddings
🔎 Similar Papers
No similar papers found.