MMoE: Robust Spoiler Detection with Multi-modal Information and Domain-aware Mixture-of-Experts

📅 2024-03-08

🏛️ arXiv.org

📈 Citations: 2

✨ Influential: 0

career value

196K/year

🤖 AI Summary

Existing spoiler detection methods for online movie review platforms suffer from three key limitations: (1) reliance on unimodal text modeling, (2) neglect of heterogeneous user–movie relational structures, and (3) strong dependence on specific spoiler linguistic patterns, leading to poor cross-genre generalization. To address these issues, we propose a trimodal spoiler detection framework that jointly integrates user–movie graph structure, review text, and metadata, augmented with a domain-aware Mixture-of-Experts (MoE) mechanism. Our approach employs Graph Neural Networks (GNNs) to encode structural relations, BERT for textual representation, and learned embeddings for metadata; it further introduces multimodal expert routing and a dynamic fusion layer to enable fine-grained domain adaptation. Evaluated on two benchmark datasets, our method achieves new state-of-the-art performance—improving accuracy by 2.56% and F1-score by 8.41%—while demonstrating significantly enhanced robustness and generalization across diverse movie genres.

Technology Category

Application Category

📝 Abstract

Online movie review websites are valuable for information and discussion about movies. However, the massive spoiler reviews detract from the movie-watching experience, making spoiler detection an important task. Previous methods simply focus on reviews' text content, ignoring the heterogeneity of information in the platform. For instance, the metadata and the corresponding user's information of a review could be helpful. Besides, the spoiler language of movie reviews tends to be genre-specific, thus posing a domain generalization challenge for existing methods. To this end, we propose MMoE, a multi-modal network that utilizes information from multiple modalities to facilitate robust spoiler detection and adopts Mixture-of-Experts to enhance domain generalization. MMoE first extracts graph, text, and meta feature from the user-movie network, the review's textual content, and the review's metadata respectively. To handle genre-specific spoilers, we then adopt Mixture-of-Experts architecture to process information in three modalities to promote robustness. Finally, we use an expert fusion layer to integrate the features from different perspectives and make predictions based on the fused embedding. Experiments demonstrate that MMoE achieves state-of-the-art performance on two widely-used spoiler detection datasets, surpassing previous SOTA methods by 2.56% and 8.41% in terms of accuracy and F1-score. Further experiments also demonstrate MMoE's superiority in robustness and generalization.

Problem

Research questions and friction points this paper is trying to address.

Detecting spoilers in movie reviews using multi-modal information

Addressing genre-specific spoiler language domain generalization challenges

Improving robustness by integrating graph, text and metadata features

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-modal network using graph, text, meta features

Mixture-of-Experts architecture for domain generalization

Expert fusion layer integrating multi-perspective features

🔎 Similar Papers

PhishAgent: A Robust Multimodal Agent for Phishing Webpage Detection