CR4T: Rewrite-Based Guardrails for Adolescent LLM Safety

📅 2026-05-20

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

Current safety mechanisms in large language models (LLMs) are predominantly designed from an adult perspective, relying on rejection-based interception that often disrupts dialogue and overlooks the developmental characteristics and emotional needs of adolescents. This work proposes CR4T, a youth-oriented LLM safety framework that reframes safety as a developmentally informed response generation task. CR4T employs a lightweight, model-agnostic risk detection module coupled with domain-conditioned text rewriting to transform unsafe or rejection-based responses into age-appropriate, guidance-oriented replies while preserving user intent. Experimental results demonstrate that CR4T significantly reduces both unsafe and refusal outputs, minimizes interference with natural conversation flow, and enhances dialogue continuity and user experience for adolescent users—all while maintaining robust safety guarantees.

📝 Abstract

Large language models (LLMs) are increasingly embedded in adolescent digital environments, mediating information seeking, advice, and emotionally sensitive interactions. Yet existing safety mechanisms remain largely grounded in adult-centric norms and operationalize safety through refusal-oriented suppression. While such approaches may reduce immediate policy violations, they can also create conversational dead-ends, limit constructive guidance, and fail to address the developmental vulnerabilities inherent in adolescent-AI interactions. We argue that adolescent LLM safety should be framed not solely as a filtering problem, but as a socio-technical, developmentally aligned transformation problem. To operationalize this perspective, we propose Critique-and-Revise-for-Teenagers (CR4T), a model-agnostic safeguarding framework that selectively reconstructs unsafe or refusal-style outputs into ageappropriate, guidance-oriented responses while preserving benign intent. CR4T combines lightweight risk detection with domain-conditioned rewriting to remove risk-amplifying content, reduce unnecessary conversational shutdown, and introduce developmentally appropriate guidance. Experimental results show that targeted rewriting substantially reduces unsafe and refusal-oriented outcomes while avoiding unnecessary intervention on acceptable interactions. These findings suggest that selective response reconstruction offers a more human-centered alternative to refusal-centric guardrails for adolescent-facing LLM systems.

Problem

Research questions and friction points this paper is trying to address.

adolescent safety

large language models

refusal-oriented suppression

developmental vulnerabilities

human-centered AI

Innovation

Methods, ideas, or system contributions that make the work stand out.

adolescent LLM safety

response rewriting

developmentally aligned AI