CR4T: Rewrite-Based Guardrails for Adolescent LLM Safety

πŸ“… 2026-05-20
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

201K/year
πŸ€– AI Summary
Current safety mechanisms in large language models (LLMs) are predominantly designed from an adult perspective, relying on rejection-based interception that often disrupts dialogue and overlooks the developmental characteristics and emotional needs of adolescents. This work proposes CR4T, a youth-oriented LLM safety framework that reframes safety as a developmentally informed response generation task. CR4T employs a lightweight, model-agnostic risk detection module coupled with domain-conditioned text rewriting to transform unsafe or rejection-based responses into age-appropriate, guidance-oriented replies while preserving user intent. Experimental results demonstrate that CR4T significantly reduces both unsafe and refusal outputs, minimizes interference with natural conversation flow, and enhances dialogue continuity and user experience for adolescent usersβ€”all while maintaining robust safety guarantees.
πŸ“ Abstract
Large language models (LLMs) are increasingly embedded in adolescent digital environments, mediating information seeking, advice, and emotionally sensitive interactions. Yet existing safety mechanisms remain largely grounded in adult-centric norms and operationalize safety through refusal-oriented suppression. While such approaches may reduce immediate policy violations, they can also create conversational dead-ends, limit constructive guidance, and fail to address the developmental vulnerabilities inherent in adolescent-AI interactions. We argue that adolescent LLM safety should be framed not solely as a filtering problem, but as a socio-technical, developmentally aligned transformation problem. To operationalize this perspective, we propose Critique-and-Revise-for-Teenagers (CR4T), a model-agnostic safeguarding framework that selectively reconstructs unsafe or refusal-style outputs into ageappropriate, guidance-oriented responses while preserving benign intent. CR4T combines lightweight risk detection with domain-conditioned rewriting to remove risk-amplifying content, reduce unnecessary conversational shutdown, and introduce developmentally appropriate guidance. Experimental results show that targeted rewriting substantially reduces unsafe and refusal-oriented outcomes while avoiding unnecessary intervention on acceptable interactions. These findings suggest that selective response reconstruction offers a more human-centered alternative to refusal-centric guardrails for adolescent-facing LLM systems.
Problem

Research questions and friction points this paper is trying to address.

adolescent safety
large language models
refusal-oriented suppression
developmental vulnerabilities
human-centered AI
Innovation

Methods, ideas, or system contributions that make the work stand out.

adolescent LLM safety
response rewriting
developmentally aligned AI
guardrail framework
model-agnostic safeguarding
πŸ”Ž Similar Papers
No similar papers found.