🤖 AI Summary
To address the challenge of detecting implicit offensiveness and bias in multimodal memes within Singapore’s multicultural context—where text-dominant moderation systems often fail—this paper proposes the first end-to-end, culture-aware moderation framework. Our method introduces a novel 112K-sample Singapore-localized meme dataset, annotated with GPT-4V assistance, and integrates OCR-based text extraction, low-resource language translation (including dialects and code-mixed utterances), and fine-tuning of a 7B-parameter vision-language model (VLM). Designed for both cultural sensitivity and computational efficiency, the framework achieves 80.62% accuracy and an AUROC of 0.8192 on a held-out test set. All components—including the trained model, source code, and dataset—are publicly released. This work significantly enhances human review efficiency and establishes a reproducible, low-resource multilingual multimodal content governance paradigm.
📝 Abstract
Traditional online content moderation systems struggle to classify modern multimodal means of communication, such as memes, a highly nuanced and information-dense medium. This task is especially hard in a culturally diverse society like Singapore, where low-resource languages are used and extensive knowledge on local context is needed to interpret online content. We curate a large collection of 112K memes labeled by GPT-4V for fine-tuning a VLM to classify offensive memes in Singapore context. We show the effectiveness of fine-tuned VLMs on our dataset, and propose a pipeline containing OCR, translation and a 7-billion parameter-class VLM. Our solutions reach 80.62% accuracy and 0.8192 AUROC on a held-out test set, and can greatly aid human in moderating online contents. The dataset, code, and model weights will be open-sourced at https://github.com/aliencaocao/vlm-for-memes-aisg.