A Fuzzy Evaluation of Sentence Encoders on Grooming Risk Classification

📅 2025-02-18

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

This work addresses the challenge of detecting child sexual grooming in online chat, specifically tackling model misclassification caused by groomers’ use of indirect or encoded language to evade detection. We propose a dual-encoder framework integrating fuzzy theory to model human subjective risk perception, coupled with cross-group (law enforcement officers, actual survivors, decoy agents) annotation consistency analysis and out-of-vocabulary (OOV) word frequency statistics. Our empirical study is the first to reveal a significant performance degradation of existing models on covert grooming dialogues and a strong correlation between misclassification rates and OOV token density. The methodology encompasses Transformer fine-tuning, fuzzy membership mapping, and lexical coverage quantification. Results provide critical empirical evidence and a novel paradigm for developing robust, interpretable, and cognitively aligned grooming detection systems resilient to linguistic obfuscation and lexical sparsity.

Technology Category

Application Category

📝 Abstract

With the advent of social media, children are becoming increasingly vulnerable to the risk of grooming in online settings. Detecting grooming instances in an online conversation poses a significant challenge as the interactions are not necessarily sexually explicit, since the predators take time to build trust and a relationship with their victim. Moreover, predators evade detection using indirect and coded language. While previous studies have fine-tuned Transformers to automatically identify grooming in chat conversations, they overlook the impact of coded and indirect language on model predictions, and how these align with human perceptions of grooming. In this paper, we address this gap and evaluate bi-encoders on the task of classifying different degrees of grooming risk in chat contexts, for three different participant groups, i.e. law enforcement officers, real victims, and decoys. Using a fuzzy-theoretic framework, we map human assessments of grooming behaviors to estimate the actual degree of grooming risk. Our analysis reveals that fine-tuned models fail to tag instances where the predator uses indirect speech pathways and coded language to evade detection. Further, we find that such instances are characterized by a higher presence of out-of-vocabulary (OOV) words in samples, causing the model to misclassify. Our findings highlight the need for more robust models to identify coded language from noisy chat inputs in grooming contexts.

Problem

Research questions and friction points this paper is trying to address.

Evaluating sentence encoders for grooming risk classification

Addressing indirect and coded language in grooming detection

Mapping human perception to grooming risk assessment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fuzzy-theoretic framework evaluates grooming risk

Bi-encoders classify grooming risk levels

Analyzes indirect speech and coded language

🔎 Similar Papers

Forecasting Credit Ratings: A Case Study where Traditional Methods Outperform Generative LLMs