๐ค AI Summary
This study addresses the growing threat of online grooming targeting adolescents, for which existing educational interventions lack tools capable of simulating the staged nature of such predatory behavior. To bridge this gap, the authors propose a dialogue agent based on offline reinforcement learning that, for the first time, incorporates stage-wise constraints into grooming behavior modeling. The agent dynamically selects conversation stages based on the userโs emotional state and proximity to the predatorโs goal, permitting transitions only between adjacent stages to enhance realism and interpretability. The approach integrates Implicit Q-Learning (IQL) with Advantage Weighted Actor-Critic (AWAC) and leverages large language models for simulation-based evaluation. Experimental results demonstrate a 43% improvement over baseline methods in reaching the final grooming stage, while maintaining over 70% emotional consistency, effectively achieving strategic yet emotionally coherent dynamic grooming simulation.
๐ Abstract
Cybergrooming is an evolving threat to youth, necessitating proactive educational interventions. We propose StagePilot, an offline RL-based dialogue agent that simulates the stage-wise progression of grooming behaviors for prevention training. StagePilot selects conversational stages using a composite reward that balances user sentiment and goal proximity, with transitions constrained to adjacent stages for realism and interpretability. We evaluate StagePilot through LLM-based simulations, measuring stage completion, dialogue efficiency, and emotional engagement. Results show that StagePilot generates realistic and coherent conversations aligned with grooming dynamics. Among tested methods, the IQL+AWAC agent achieves the best balance between strategic planning and emotional coherence, reaching the final stage up to 43% more frequently than baselines while maintaining over 70% sentiment alignment.