🤖 AI Summary
This study addresses the lack of a unified analytical framework for the multidimensional problem of information disorder in existing online harmful content detection systems. To bridge this gap, the work proposes a novel approach grounded in argumentative structure, which jointly models the verifiability of premises and conclusions alongside hate labels to predict the overall hateful intent of a message. This formulation explicitly connects subtasks such as hate speech detection and information veracity assessment. Evaluated on the WSF-ARG+ dataset using natural language processing techniques and classification models, the proposed method achieves an F1 score of 96% in hate content identification, demonstrating both the efficacy and innovative potential of incorporating argumentative logic into harmful content detection.
📝 Abstract
Information disorder is a challenging phenomenon that affects society at large. This phenomenon entails the diffusion of misleading, misinforming, and hateful content online. In different contexts, one aspect of the problem may prevail, but overall, this is a broad problem that requires comprehensive solutions. While each dimension of the problem (hate speech, disinformation, misinformation, etc.) requires in-depth analysis, in this paper, we look into the possibility of argument structure to provide relevant information to link these different areas of the problem. In particular, we focus on the WSF-ARG+ dataset, which consists of white supremacy forum messages annotated in terms of argument structure (premises and conclusion). There, we leverage the checkworthiness and hatefulness annotations of the argument components to obtain insights into the hatefulness of the whole message. Our results show promising insights (up to 96% F1), indicating the possibility of extending this direction in the future to tackle hateful content identification and information disorder countering.