🤖 AI Summary
This work addresses the challenge of detecting cross-session, progressive, and recurrent malicious behaviors—such as fraud—in live streaming platforms, which traditional methods struggle to identify effectively. The authors propose CS-VAR, a novel system that leverages retrieval-augmented large language models (RAG) to model cross-session behavioral evidence for the first time. Through knowledge distillation, global risk insights are transferred to a lightweight domain-specific model, enabling efficient and interpretable real-time risk assessment. Evaluated on a large-scale industrial dataset, CS-VAR achieves state-of-the-art performance; its online deployment significantly enhances content moderation efficiency while delivering fine-grained, explainable local risk signals.
📝 Abstract
The rise of live streaming has transformed online interaction, enabling massive real-time engagement but also exposing platforms to complex risks such as scams and coordinated malicious behaviors. Detecting these risks is challenging because harmful actions often accumulate gradually and recur across seemingly unrelated streams. To address this, we propose CS-VAR (Cross-Session Evidence-Aware Retrieval-Augmented Detector) for live streaming risk assessment. In CS-VAR, a lightweight, domain-specific model performs fast session-level risk inference, guided during training by a Large Language Model (LLM) that reasons over retrieved cross-session behavioral evidence and transfers its local-to-global insights to the small model. This design enables the small model to recognize recurring patterns across streams, perform structured risk assessment, and maintain efficiency for real-time deployment. Extensive offline experiments on large-scale industrial datasets, combined with online validation, demonstrate the state-of-the-art performance of CS-VAR. Furthermore, CS-VAR provides interpretable, localized signals that effectively empower real-world moderation for live streaming.