🤖 AI Summary
Online content moderation faces significant challenges due to the complexity and variability of user-generated content, as well as the limited generalization and poor interpretability of traditional approaches. This work proposes the first moderation framework that integrates analogical reasoning with end-to-end optimization, leveraging large language models to jointly train analogical retrieval, dynamic rule generation, and classification decision modules. A context-aware hierarchical reasoning mechanism is introduced to enhance coherence and adaptability. The proposed method substantially outperforms baseline approaches—including rule-injected fine-tuning and static retrieval-augmented generation—in both accuracy and rule quality. Human evaluations and external testing confirm that the generated rules exhibit high interpretability and strong generalization capabilities across diverse content domains.
📝 Abstract
Content moderation in online platforms faces persistent challenges due to the evolving complexity of user-generated content and the limitations of traditional rule-based and machine learning approaches. While recent advances in large language models (LLMs) have enabled more sophisticated moderation via direct prompting or fine-tuning, these approaches often exhibit limited generalization, interpretability, and adaptability to unseen or ambiguous cases.
In this work, we propose a novel moderation framework that leverages analogical examples to enhance rule induction and decision reliability. Our approach integrates end-to-end optimization of analogical retrieval, rule generation, and moderation classification, enabling the dynamic adaptation of moderation rules to diverse content scenarios. Through comprehensive experiments, we demonstrate that our method significantly outperforms both rule-injected fine-tuning baselines and multi-stage static RAG pipelines in terms of moderation accuracy and rule quality. Further evaluations, including human assessments and external model generalization tests, confirm that our framework produces rules with better clarity, interpretability, and applicability. These findings show that analogical example-driven methods can advance robust, explainable, and generalizable content moderation in real-world applications.