StagePilot: A Deep Reinforcement Learning Agent for Stage-Controlled Cybergrooming Simulation

๐Ÿ“… 2026-02-04
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This study addresses the growing threat of online grooming targeting adolescents, for which existing educational interventions lack tools capable of simulating the staged nature of such predatory behavior. To bridge this gap, the authors propose a dialogue agent based on offline reinforcement learning that, for the first time, incorporates stage-wise constraints into grooming behavior modeling. The agent dynamically selects conversation stages based on the userโ€™s emotional state and proximity to the predatorโ€™s goal, permitting transitions only between adjacent stages to enhance realism and interpretability. The approach integrates Implicit Q-Learning (IQL) with Advantage Weighted Actor-Critic (AWAC) and leverages large language models for simulation-based evaluation. Experimental results demonstrate a 43% improvement over baseline methods in reaching the final grooming stage, while maintaining over 70% emotional consistency, effectively achieving strategic yet emotionally coherent dynamic grooming simulation.

Technology Category

Application Category

๐Ÿ“ Abstract
Cybergrooming is an evolving threat to youth, necessitating proactive educational interventions. We propose StagePilot, an offline RL-based dialogue agent that simulates the stage-wise progression of grooming behaviors for prevention training. StagePilot selects conversational stages using a composite reward that balances user sentiment and goal proximity, with transitions constrained to adjacent stages for realism and interpretability. We evaluate StagePilot through LLM-based simulations, measuring stage completion, dialogue efficiency, and emotional engagement. Results show that StagePilot generates realistic and coherent conversations aligned with grooming dynamics. Among tested methods, the IQL+AWAC agent achieves the best balance between strategic planning and emotional coherence, reaching the final stage up to 43% more frequently than baselines while maintaining over 70% sentiment alignment.
Problem

Research questions and friction points this paper is trying to address.

cybergrooming
stage-controlled simulation
preventive education
dialogue agent
youth safety
Innovation

Methods, ideas, or system contributions that make the work stand out.

offline reinforcement learning
stage-controlled dialogue
cybergrooming simulation
composite reward
IQL+AWAC
๐Ÿ”Ž Similar Papers
No similar papers found.