The Age of Curiosity Meets the Age of AI: Benchmarking Child Safety in Large Language Models

📅 2026-05-25

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

This study addresses the lack of age-appropriate safety evaluations for large language models (LLMs) targeting children aged 7–11, who may receive responses misaligned with their cognitive development. To bridge this gap, the authors propose KIDBench, a novel evaluation benchmark grounded in developmental psychology theory, featuring single- and multi-turn dialogue scenarios across ten thematic categories and supporting multilingual and multicultural contexts. They introduce an innovative LLM-as-a-Judge scoring mechanism and develop two specialized models—KIDGuardLlama for assessment and KIDLlama for response generation—leveraging implicit and explicit age-prompting strategies alongside fine-tuning. Experiments demonstrate that implicit prompting improves safety scores by 9–47%, with explicit instructions yielding an additional 10–30% gain; however, response quality declines by 6–24% in multi-turn interactions, confirming both the benchmark’s validity and the critical role of age-aware prompting.

📝 Abstract

Children increasingly have access to Large Language Models (LLMs), which may expose them to responses that are developmentally inappropriate or require age-sensitive safety, guidance, and boundaries. Existing LLM safety evaluations largely focus on harmful-content avoidance and do not explicitly target child-facing safety. We introduce KIDBench, a benchmark for evaluating child-facing LLM safety for ages 7--11 using a developmental-psychology-grounded LLM-as-a-Judge rubric. KIDBench contains realistic child queries across ten categories, with single-turn prompts and multi-turn child-actor simulations. We compare no-cues prompts with no child context, implicit-cues prompts that suggest a child speaker, and explicit age instructions. Implicit-cues improve scores by 9--47% across models, while explicit age adds a further 10--30% gain. Cross-lingual and cultural evaluations show uneven safety behavior across languages and country contexts. Multi-turn simulations show that child-facing response quality can degrade by 6--24% from the first to worst turn. Beyond evaluation, we introduce KIDGuardLlama, a child-safety evaluator, and KIDLlama, a child-oriented response model, showing how KIDBench supports safer child-facing AI

Problem

Research questions and friction points this paper is trying to address.

child safety

large language models

AI safety

developmental appropriateness

age-sensitive AI

Innovation

Methods, ideas, or system contributions that make the work stand out.

child-facing AI safety

KIDBench

developmental psychology