🤖 AI Summary
Existing content moderation approaches struggle to detect gender-based discrimination and misogynistic content embedded in memes, as they often overlook community interactions and social dynamics. This work proposes an end-to-end trainable multimodal framework that models semantic and social relationships among memes through a novel cross-meme graph reasoning mechanism, overcoming the limitations of heuristic graph construction and instance-level reasoning. By integrating visual and textual information, the method employs a learnable graph neural network to construct semantically meaningful graph structures. Evaluated on the MAMI and EXIST benchmarks, the approach outperforms state-of-the-art methods with faster convergence and yields graph structures that exhibit strong semantic interpretability.
📝 Abstract
Women are twice as likely as men to face online harassment due to their gender. Despite recent advances in multimodal content moderation, most approaches still overlook the social dynamics behind this phenomenon, where perpetrators reinforce prejudices and group identity within like-minded communities. Graph-based methods offer a promising way to capture such interactions, yet existing solutions remain limited by heuristic graph construction, shallow modality fusion, and instance-level reasoning. In this work, we present MemeWeaver, an end-to-end trainable multimodal framework for detecting sexism and misogyny through a novel inter-meme graph reasoning mechanism. We systematically evaluate multiple visual--textual fusion strategies and show that our approach consistently outperforms state-of-the-art baselines on the MAMI and EXIST benchmarks, while achieving faster training convergence. Further analyses reveal that the learned graph structure captures semantically meaningful patterns, offering valuable insights into the relational nature of online hate.