Classification is a RAG problem: A case study on hate speech detection

📅 2025-08-08

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

To address the challenge of rapidly adapting hate speech detection systems to dynamic policy updates, this paper proposes the Contextual Policy Engine (CPE), a proxy-based retrieval-augmented generation (RAG) framework. CPE reformulates classification as a semantic retrieval and policy-text reasoning process, enabling zero-shot policy transfer via real-time retrieval of up-to-date policy fragments—without model retraining or fine-tuning. Its key contributions are: (1) adjustable policy enforcement supporting fine-grained identity group protection; (2) intrinsic interpretability grounded in retrieved policy snippets; and (3) plug-and-play policy updates while preserving performance stability. Experiments on mainstream hate speech benchmarks show that CPE achieves accuracy comparable to leading commercial systems, significantly enhancing the adaptability, controllability, and transparency of content moderation systems.

Technology Category

Application Category

📝 Abstract

Robust content moderation requires classification systems that can quickly adapt to evolving policies without costly retraining. We present classification using Retrieval-Augmented Generation (RAG), which shifts traditional classification tasks from determining the correct category in accordance with pre-trained parameters to evaluating content in relation to contextual knowledge retrieved at inference. In hate speech detection, this transforms the task from "is this hate speech?" to "does this violate the hate speech policy?" Our Contextual Policy Engine (CPE) - an agentic RAG system - demonstrates this approach and offers three key advantages: (1) robust classification accuracy comparable to leading commercial systems, (2) inherent explainability via retrieved policy segments, and (3) dynamic policy updates without model retraining. Through three experiments, we demonstrate strong baseline performance and show that the system can apply fine-grained policy control by correctly adjusting protection for specific identity groups without requiring retraining or compromising overall performance. These findings establish that RAG can transform classification into a more flexible, transparent, and adaptable process for content moderation and wider classification problems.

Problem

Research questions and friction points this paper is trying to address.

Adapting classification systems to evolving policies without retraining

Transforming hate speech detection into policy violation evaluation

Enabling dynamic policy updates without compromising classification accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Retrieval-Augmented Generation for classification

Dynamic policy updates without retraining model

Explainable via retrieved policy segments

🔎 Similar Papers

No similar papers found.