🤖 AI Summary
This work addresses the limitations of existing content moderation systems, which rely on platform-specific supervision and static taxonomies, hindering generalization to new domains and detection of emerging illicit promotions. The study introduces in-context learning (ICL) to this task for the first time, proposing a unified, fine-tuning-free detection framework that guides large language models during inference using contextual examples and incorporates a two-stage label distillation mechanism. With only 1/22 of the labeled data required by fine-tuned baselines, the method achieves comparable performance, attaining 92.6% accuracy on a real-world dataset of 200,000 samples. Notably, 61.8% of the correctly identified cases involve covert content missed by current systems, and the approach successfully uncovers eight novel categories of illicit promotion, enabling cross-platform, zero-adaptation detection of previously unknown threats.
📝 Abstract
Illicit online promotion is a persistent threat that evolves to evade detection. Existing moderation systems remain tethered to platform-specific supervision and static taxonomies, a reactive paradigm that struggles to generalize across domains or uncover novel threats.
This paper presents a systematic study of In-Context Learning (ICL) as a unified framework for illicit promotion detection. Through rigorous analysis, we show that properly configured ICL achieves performance comparable to fine-tuned models using 22x fewer labeled examples. We demonstrate three key capabilities: (1) Generalization to unseen threats: ICL generalizes to new illicit categories without category-specific demonstrations, with a performance drop of less than 6% for most evaluated categories. (2) Autonomous discovery: A novel two-stage pipeline distills 2,900 free-form labels into coherent taxonomies, surfacing eight previously undocumented illicit categories such as usury and illegal immigration. (3) Cross-platform generalization: Deployed on 200,000 real-world samples from search engines and Twitter without adaptation, ICL achieves 92.6% accuracy. Furthermore, 61.8% of its uniquely flagged samples correspond to borderline or obfuscated content missed by existing detectors.
Our findings position ICL as a new paradigm for content moderation, combining the precision of specialized classifiers with cross-platform generalization and autonomous threat discovery. By shifting to inference-time reasoning, ICL offers a path toward proactively adaptive moderation systems.