Video-Based MPAA Rating Prediction: An Attention-Driven Hybrid Architecture Using Contrastive Learning

πŸ“… 2025-09-08
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Automated video classification for MPAA age ratings (G/PG/PG-13/R) faces challenges including heavy reliance on labeled data, difficulty distinguishing boundary classes (e.g., PG-13 vs. R), and poor generalization. To address these, we propose a hybrid model integrating contextual contrastive learning with Bahdanau attention. Built upon the LRCN architecture, it jointly optimizes NT-Xent, NT-logistic, and margin triplet loss functions to enhance discriminative representation learning. Bahdanau attention dynamically weights key frames, improving fine-grained interpretability. Evaluated on a standard benchmark, our method achieves 88.0% accuracy and an F1 score of 0.8815β€”setting a new state-of-the-art. The model has been deployed as a real-time web service for content compliance review in streaming platforms.

Technology Category

Application Category

πŸ“ Abstract
The rapid growth of visual content consumption across platforms necessitates automated video classification for age-suitability standards like the MPAA rating system (G, PG, PG-13, R). Traditional methods struggle with large labeled data requirements, poor generalization, and inefficient feature learning. To address these challenges, we employ contrastive learning for improved discrimination and adaptability, exploring three frameworks: Instance Discrimination, Contextual Contrastive Learning, and Multi-View Contrastive Learning. Our hybrid architecture integrates an LRCN (CNN+LSTM) backbone with a Bahdanau attention mechanism, achieving state-of-the-art performance in the Contextual Contrastive Learning framework, with 88% accuracy and an F1 score of 0.8815. By combining CNNs for spatial features, LSTMs for temporal modeling, and attention mechanisms for dynamic frame prioritization, the model excels in fine-grained borderline distinctions, such as differentiating PG-13 and R-rated content. We evaluate the model's performance across various contrastive loss functions, including NT-Xent, NT-logistic, and Margin Triplet, demonstrating the robustness of our proposed architecture. To ensure practical application, the model is deployed as a web application for real-time MPAA rating classification, offering an efficient solution for automated content compliance across streaming platforms.
Problem

Research questions and friction points this paper is trying to address.

Automated video classification for MPAA age-suitability standards
Addressing poor generalization and inefficient feature learning
Differentiating fine-grained borderline distinctions like PG-13 and R
Innovation

Methods, ideas, or system contributions that make the work stand out.

Contrastive learning for improved discrimination and adaptability
Hybrid LRCN backbone with Bahdanau attention mechanism
Combines CNNs, LSTMs, and attention for dynamic prioritization
πŸ”Ž Similar Papers
No similar papers found.
D
Dipta Neogi
Department of Electrical and Computer Engineering, North South University, Dhaka, 1229, Bangladesh
N
Nourash Azmine Chowdhury
Department of Electrical and Computer Engineering, North South University, Dhaka, 1229, Bangladesh
Muhammad Rafsan Kabir
Muhammad Rafsan Kabir
Department of Electrical and Computer Engineering, North South University
machine learningnatural language processingcomputer vision
Mohammad Ashrafuzzaman Khan
Mohammad Ashrafuzzaman Khan
Associate Professor of CS, North South University
Distributed ComputingMachine LearningArtificial IntelligenceBig Data