MMoE: Robust Spoiler Detection with Multi-modal Information and Domain-aware Mixture-of-Experts

📅 2024-03-08
🏛️ arXiv.org
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
Existing spoiler detection methods for online movie review platforms suffer from three key limitations: (1) reliance on unimodal text modeling, (2) neglect of heterogeneous user–movie relational structures, and (3) strong dependence on specific spoiler linguistic patterns, leading to poor cross-genre generalization. To address these issues, we propose a trimodal spoiler detection framework that jointly integrates user–movie graph structure, review text, and metadata, augmented with a domain-aware Mixture-of-Experts (MoE) mechanism. Our approach employs Graph Neural Networks (GNNs) to encode structural relations, BERT for textual representation, and learned embeddings for metadata; it further introduces multimodal expert routing and a dynamic fusion layer to enable fine-grained domain adaptation. Evaluated on two benchmark datasets, our method achieves new state-of-the-art performance—improving accuracy by 2.56% and F1-score by 8.41%—while demonstrating significantly enhanced robustness and generalization across diverse movie genres.

Technology Category

Application Category

📝 Abstract
Online movie review websites are valuable for information and discussion about movies. However, the massive spoiler reviews detract from the movie-watching experience, making spoiler detection an important task. Previous methods simply focus on reviews' text content, ignoring the heterogeneity of information in the platform. For instance, the metadata and the corresponding user's information of a review could be helpful. Besides, the spoiler language of movie reviews tends to be genre-specific, thus posing a domain generalization challenge for existing methods. To this end, we propose MMoE, a multi-modal network that utilizes information from multiple modalities to facilitate robust spoiler detection and adopts Mixture-of-Experts to enhance domain generalization. MMoE first extracts graph, text, and meta feature from the user-movie network, the review's textual content, and the review's metadata respectively. To handle genre-specific spoilers, we then adopt Mixture-of-Experts architecture to process information in three modalities to promote robustness. Finally, we use an expert fusion layer to integrate the features from different perspectives and make predictions based on the fused embedding. Experiments demonstrate that MMoE achieves state-of-the-art performance on two widely-used spoiler detection datasets, surpassing previous SOTA methods by 2.56% and 8.41% in terms of accuracy and F1-score. Further experiments also demonstrate MMoE's superiority in robustness and generalization.
Problem

Research questions and friction points this paper is trying to address.

Detecting spoilers in movie reviews using multi-modal information
Addressing genre-specific spoiler language domain generalization challenges
Improving robustness by integrating graph, text and metadata features
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-modal network using graph, text, meta features
Mixture-of-Experts architecture for domain generalization
Expert fusion layer integrating multi-perspective features
🔎 Similar Papers
No similar papers found.
Z
Zinan Zeng
Xi’an Jiaotong University
S
Sen Ye
Xi’an Jiaotong University
Z
Zijian Cai
Xi’an Jiaotong University
H
Heng Wang
Xi’an Jiaotong University
Y
Yuhan Liu
Xi’an Jiaotong University
Qinghua Zheng
Qinghua Zheng
Xi'an Jiaotong University
AI
Minnan Luo
Minnan Luo
Professor, Xi'an Jiaotong University