🤖 AI Summary
This paper addresses the challenge of evaluating large language models’ (LLMs) capability to detect self-destructive content (e.g., drug overdose, eating disorders, self-harm) in bilingual Chinese–Japanese social media. We introduce JiraiBench—the first bilingual benchmark tailored to the “Jirai” (a Japanese subculture associated with self-destructive behavior), comprising 10,419 authentic Chinese and 5,000 Japanese social media posts. Methodologically, we employ multi-dimensional human annotation (three behavioral categories plus cultural intensity scoring), bilingual consistency evaluation, and cross-lingual zero-shot/fine-tuning transfer experiments. Key contributions include: (1) the first empirical demonstration that cultural proximity can outweigh linguistic similarity in improving detection performance; (2) discovery that Japanese prompts outperform Chinese prompts for identifying self-destructive content in Chinese text; and (3) validation of effective cross-lingual knowledge transfer without target-language training. Experiments across four state-of-the-art models confirm that culture-aware modeling significantly enhances robustness and generalization.
📝 Abstract
This paper introduces JiraiBench, the first bilingual benchmark for evaluating large language models' effectiveness in detecting self-destructive content across Chinese and Japanese social media communities. Focusing on the transnational"Jirai"(landmine) online subculture that encompasses multiple forms of self-destructive behaviors including drug overdose, eating disorders, and self-harm, we present a comprehensive evaluation framework incorporating both linguistic and cultural dimensions. Our dataset comprises 10,419 Chinese posts and 5,000 Japanese posts with multidimensional annotation along three behavioral categories, achieving substantial inter-annotator agreement. Experimental evaluations across four state-of-the-art models reveal significant performance variations based on instructional language, with Japanese prompts unexpectedly outperforming Chinese prompts when processing Chinese content. This emergent cross-cultural transfer suggests that cultural proximity can sometimes outweigh linguistic similarity in detection tasks. Cross-lingual transfer experiments with fine-tuned models further demonstrate the potential for knowledge transfer between these language systems without explicit target language training. These findings highlight the need for culturally-informed approaches to multilingual content moderation and provide empirical evidence for the importance of cultural context in developing more effective detection systems for vulnerable online communities.