🤖 AI Summary
Existing spoiler detection methods for online movie review platforms suffer from three key limitations: (1) reliance on unimodal text modeling, (2) neglect of heterogeneous user–movie relational structures, and (3) strong dependence on specific spoiler linguistic patterns, leading to poor cross-genre generalization. To address these issues, we propose a trimodal spoiler detection framework that jointly integrates user–movie graph structure, review text, and metadata, augmented with a domain-aware Mixture-of-Experts (MoE) mechanism. Our approach employs Graph Neural Networks (GNNs) to encode structural relations, BERT for textual representation, and learned embeddings for metadata; it further introduces multimodal expert routing and a dynamic fusion layer to enable fine-grained domain adaptation. Evaluated on two benchmark datasets, our method achieves new state-of-the-art performance—improving accuracy by 2.56% and F1-score by 8.41%—while demonstrating significantly enhanced robustness and generalization across diverse movie genres.
📝 Abstract
Online movie review websites are valuable for information and discussion about movies. However, the massive spoiler reviews detract from the movie-watching experience, making spoiler detection an important task. Previous methods simply focus on reviews' text content, ignoring the heterogeneity of information in the platform. For instance, the metadata and the corresponding user's information of a review could be helpful. Besides, the spoiler language of movie reviews tends to be genre-specific, thus posing a domain generalization challenge for existing methods. To this end, we propose MMoE, a multi-modal network that utilizes information from multiple modalities to facilitate robust spoiler detection and adopts Mixture-of-Experts to enhance domain generalization. MMoE first extracts graph, text, and meta feature from the user-movie network, the review's textual content, and the review's metadata respectively. To handle genre-specific spoilers, we then adopt Mixture-of-Experts architecture to process information in three modalities to promote robustness. Finally, we use an expert fusion layer to integrate the features from different perspectives and make predictions based on the fused embedding. Experiments demonstrate that MMoE achieves state-of-the-art performance on two widely-used spoiler detection datasets, surpassing previous SOTA methods by 2.56% and 8.41% in terms of accuracy and F1-score. Further experiments also demonstrate MMoE's superiority in robustness and generalization.