🤖 AI Summary
This work addresses the scarcity of safety evaluation resources for Kazakh across diverse risk categories by presenting the first systematically constructed dataset comprising 5,717 native Kazakh (Cyrillic script) prompts spanning 11 risk scenarios. Designed to reflect natural query patterns—particularly those of adolescent users—the dataset includes English translations to facilitate cross-lingual analysis. Through rigorously defined prompt creation and annotation protocols, multi-tiered quality control, and cross-lingual alignment, the resource aligns with established safety taxonomies and demonstrates that evaluations relying solely on English overlook language-specific vulnerabilities. Baseline testing using GPT-4o reveals an overall refusal rate of 28.2% (ranging from 5.5% to 53.8% across categories), underscoring the unique value of Kazakh-language evaluation in uncovering category-level safety gaps.
📝 Abstract
Kazakh is underrepresented in resources for evaluating the safety behavior of large language models. We present KZ-SafetyPrompts, a Kazakh prompt dataset for safety evaluation across eleven categories covering common risk areas such as self-harm, violence, child exploitation, sexual content, racist content, radicalization, and regulated goods or illegal activities. The dataset contains 5,717 prompts written natively in Kazakh (Cyrillic), organized by category, with English translations for cross-lingual analysis. Prompts resemble realistic user queries, often in a teen or child style, and are phrased as intent prompts without procedural instructions. We document the writing protocol, labeling procedures (including borderline-case decision rules), and quality-control steps (schema standardization, completeness checks, and deduplication). We also align the categories with widely used safety taxonomies to support integration with existing evaluation pipelines. Baseline results with GPT-4o show an overall refusal rate of 28.2%, varying from 5.5% to 53.8% across categories, indicating that Kazakh prompts expose category-specific safety gaps not captured by English-only evaluation.