ChineseHarm-Bench: A Chinese Harmful Content Detection Benchmark

📅 2025-06-12

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

Chinese harmful content detection faces challenges including scarce labeled data, narrow category coverage, and the absence of specialized benchmarks. To address these, we introduce CHD-Bench—the first large-scale, real-world, six-category comprehensive benchmark for Chinese harmful content detection—alongside an interpretable knowledge rulebase. Methodologically, we propose a lightweight detection paradigm that synergistically integrates domain-specific knowledge rules with the implicit knowledge of large language models (LLMs): high-quality annotations ensure labeling fidelity; knowledge rule modeling, LLM knowledge distillation, and knowledge-augmented fine-tuning collectively enhance small-model performance. Experiments demonstrate that our lightweight models achieve detection accuracy comparable to state-of-the-art (SOTA) large models on CHD-Bench. The benchmark dataset, knowledge rulebase, and source code are fully open-sourced.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) have been increasingly applied to automated harmful content detection tasks, assisting moderators in identifying policy violations and improving the overall efficiency and accuracy of content review. However, existing resources for harmful content detection are predominantly focused on English, with Chinese datasets remaining scarce and often limited in scope. We present a comprehensive, professionally annotated benchmark for Chinese content harm detection, which covers six representative categories and is constructed entirely from real-world data. Our annotation process further yields a knowledge rule base that provides explicit expert knowledge to assist LLMs in Chinese harmful content detection. In addition, we propose a knowledge-augmented baseline that integrates both human-annotated knowledge rules and implicit knowledge from large language models, enabling smaller models to achieve performance comparable to state-of-the-art LLMs. Code and data are available at https://github.com/zjunlp/ChineseHarm-bench.

Problem

Research questions and friction points this paper is trying to address.

Lack of Chinese datasets for harmful content detection

Need for expert-annotated benchmark in Chinese content harm detection

Integration of human knowledge and LLMs for improved detection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Comprehensive Chinese harmful content detection benchmark

Knowledge rule base for expert-guided LLM detection

Knowledge-augmented baseline integrating human and LLM insights

🔎 Similar Papers

No similar papers found.