JiraiBench: A Bilingual Benchmark for Evaluating Large Language Models' Detection of Human Self-Destructive Behavior Content in Jirai Community

📅 2025-03-27

📈 Citations: 0

✨ Influential: 0

career value

227K/year

🤖 AI Summary

This paper addresses the challenge of evaluating large language models’ (LLMs) capability to detect self-destructive content (e.g., drug overdose, eating disorders, self-harm) in bilingual Chinese–Japanese social media. We introduce JiraiBench—the first bilingual benchmark tailored to the “Jirai” (a Japanese subculture associated with self-destructive behavior), comprising 10,419 authentic Chinese and 5,000 Japanese social media posts. Methodologically, we employ multi-dimensional human annotation (three behavioral categories plus cultural intensity scoring), bilingual consistency evaluation, and cross-lingual zero-shot/fine-tuning transfer experiments. Key contributions include: (1) the first empirical demonstration that cultural proximity can outweigh linguistic similarity in improving detection performance; (2) discovery that Japanese prompts outperform Chinese prompts for identifying self-destructive content in Chinese text; and (3) validation of effective cross-lingual knowledge transfer without target-language training. Experiments across four state-of-the-art models confirm that culture-aware modeling significantly enhances robustness and generalization.

Technology Category

Application Category

📝 Abstract

This paper introduces JiraiBench, the first bilingual benchmark for evaluating large language models' effectiveness in detecting self-destructive content across Chinese and Japanese social media communities. Focusing on the transnational"Jirai"(landmine) online subculture that encompasses multiple forms of self-destructive behaviors including drug overdose, eating disorders, and self-harm, we present a comprehensive evaluation framework incorporating both linguistic and cultural dimensions. Our dataset comprises 10,419 Chinese posts and 5,000 Japanese posts with multidimensional annotation along three behavioral categories, achieving substantial inter-annotator agreement. Experimental evaluations across four state-of-the-art models reveal significant performance variations based on instructional language, with Japanese prompts unexpectedly outperforming Chinese prompts when processing Chinese content. This emergent cross-cultural transfer suggests that cultural proximity can sometimes outweigh linguistic similarity in detection tasks. Cross-lingual transfer experiments with fine-tuned models further demonstrate the potential for knowledge transfer between these language systems without explicit target language training. These findings highlight the need for culturally-informed approaches to multilingual content moderation and provide empirical evidence for the importance of cultural context in developing more effective detection systems for vulnerable online communities.

Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs in detecting self-destructive content across Chinese and Japanese social media

Analyzing cultural and linguistic impacts on model performance in content moderation

Exploring cross-lingual knowledge transfer for detecting harmful online behaviors

Innovation

Methods, ideas, or system contributions that make the work stand out.

Bilingual benchmark for self-destructive content detection

Multidimensional annotation with inter-annotator agreement

Cross-cultural transfer enhances detection performance

🔎 Similar Papers

No similar papers found.