๐ค AI Summary
NSFW images increasingly exhibit fine-grained obfuscation and complex semantic features, enabling evasion of existing deep learningโbased detection models; meanwhile, heterogeneous regulatory policies across platforms and regions exacerbate annotation bias and auditing inconsistency. To address these challenges, we propose an adaptive moderation framework featuring multi-granularity semantic awareness and plug-and-play regulatory compliance. Specifically, we leverage vision-language models (VLMs) for deep semantic understanding; design a rule-driven dynamic decision module to enable real-time adaptation to heterogeneous moderation policies; and introduce content disentanglement and controversy-aware re-annotation to mitigate label noise in public NSFW datasets. Evaluated on diverse NSFW image categories, our method achieves up to 54.3% absolute accuracy improvement over baselines. It demonstrates strong generalization across categories, deployment scenarios, and underlying VLM backbones, and has been validated in production environments.
๐ Abstract
Not Safe/Suitable for Work (NSFW) content is rampant on social networks and poses serious harm to citizens, especially minors. Current detection methods mainly rely on deep learning-based image recognition and classification. However, NSFW images are now presented in increasingly sophisticated ways, often using image details and complex semantics to obscure their true nature or attract more views. Although still understandable to humans, these images often evade existing detection methods, posing a significant threat. Further complicating the issue, varying regulations across platforms and regions create additional challenges for effective moderation, leading to detection bias and reduced accuracy. To address this, we propose VModA, a general and effective framework that adapts to diverse moderation rules and handles complex, semantically rich NSFW content across categories. Experimental results show that VModA significantly outperforms existing methods, achieving up to a 54.3% accuracy improvement across NSFW types, including those with complex semantics. Further experiments demonstrate that our method exhibits strong adaptability across categories, scenarios, and base VLMs. We also identified inconsistent and controversial label samples in public NSFW benchmark datasets, re-annotated them, and submitted corrections to the original maintainers. Two datasets have confirmed the updates so far. Additionally, we evaluate VModA in real-world scenarios to demonstrate its practical effectiveness.