StagePilot: A Deep Reinforcement Learning Agent for Stage-Controlled Cybergrooming Simulation

📅 2026-02-04

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

This study addresses the growing threat of online grooming targeting adolescents, for which existing educational interventions lack tools capable of simulating the staged nature of such predatory behavior. To bridge this gap, the authors propose a dialogue agent based on offline reinforcement learning that, for the first time, incorporates stage-wise constraints into grooming behavior modeling. The agent dynamically selects conversation stages based on the user’s emotional state and proximity to the predator’s goal, permitting transitions only between adjacent stages to enhance realism and interpretability. The approach integrates Implicit Q-Learning (IQL) with Advantage Weighted Actor-Critic (AWAC) and leverages large language models for simulation-based evaluation. Experimental results demonstrate a 43% improvement over baseline methods in reaching the final grooming stage, while maintaining over 70% emotional consistency, effectively achieving strategic yet emotionally coherent dynamic grooming simulation.

Technology Category

Application Category

📝 Abstract

Cybergrooming is an evolving threat to youth, necessitating proactive educational interventions. We propose StagePilot, an offline RL-based dialogue agent that simulates the stage-wise progression of grooming behaviors for prevention training. StagePilot selects conversational stages using a composite reward that balances user sentiment and goal proximity, with transitions constrained to adjacent stages for realism and interpretability. We evaluate StagePilot through LLM-based simulations, measuring stage completion, dialogue efficiency, and emotional engagement. Results show that StagePilot generates realistic and coherent conversations aligned with grooming dynamics. Among tested methods, the IQL+AWAC agent achieves the best balance between strategic planning and emotional coherence, reaching the final stage up to 43% more frequently than baselines while maintaining over 70% sentiment alignment.

Problem

Research questions and friction points this paper is trying to address.

cybergrooming

stage-controlled simulation

preventive education

dialogue agent

youth safety

Innovation

Methods, ideas, or system contributions that make the work stand out.

offline reinforcement learning

stage-controlled dialogue

cybergrooming simulation