Minimizing Risk Through Minimizing Model-Data Interaction: A Protocol For Relying on Proxy Tasks When Designing Child Sexual Abuse Imagery Detection Models

📅 2025-05-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Training child sexual abuse imagery (CSAI) detection models is severely hindered by the extreme sensitivity of real CSAI data, precluding direct access to authentic samples for model development. Method: This paper proposes a privacy-first proxy-task modeling paradigm. We formally define “proxy tasks” and co-design alternative learning objectives with law enforcement agencies (LEA-in-the-loop); few-shot indoor scene classification serves as a representative proxy task, enabling knowledge transfer via cross-domain adaptation and few-shot learning. Crucially, all model weights are trained exclusively on non-sensitive proxy data—never exposed to any CSAI sample—thereby eliminating data leakage risks. Contribution/Results: Evaluated on real-world CSAI datasets, our approach achieves competitive detection performance. It establishes the first CSAI detection framework that simultaneously ensures rigorous privacy compliance and practical efficacy, marking a substantive breakthrough in zero-sensitive-data training for forensic AI applications.

Technology Category

Application Category

📝 Abstract
The distribution of child sexual abuse imagery (CSAI) is an ever-growing concern of our modern world; children who suffered from this heinous crime are revictimized, and the growing amount of illegal imagery distributed overwhelms law enforcement agents (LEAs) with the manual labor of categorization. To ease this burden researchers have explored methods for automating data triage and detection of CSAI, but the sensitive nature of the data imposes restricted access and minimal interaction between real data and learning algorithms, avoiding leaks at all costs. In observing how these restrictions have shaped the literature we formalize a definition of"Proxy Tasks", i.e., the substitute tasks used for training models for CSAI without making use of CSA data. Under this new terminology we review current literature and present a protocol for making conscious use of Proxy Tasks together with consistent input from LEAs to design better automation in this field. Finally, we apply this protocol to study -- for the first time -- the task of Few-shot Indoor Scene Classification on CSAI, showing a final model that achieves promising results on a real-world CSAI dataset whilst having no weights actually trained on sensitive data.
Problem

Research questions and friction points this paper is trying to address.

Automating detection of child sexual abuse imagery (CSAI) without direct data access
Developing proxy tasks to train models without using sensitive CSAI data
Improving few-shot indoor scene classification for CSAI detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Using Proxy Tasks to avoid sensitive data interaction
Formalizing protocol with LEAs for better automation
Few-shot Indoor Scene Classification on CSAI
🔎 Similar Papers
No similar papers found.
T
Thamiris Coelho
Instituto de Computação, Universidade Estadual de Campinas (UNICAMP), Campinas, Brazil
L
Leo S. F. Ribeiro
Instituto de Ciências Matemáticas e de Computação, Universidade de São Paulo (USP), São Carlos, Brazil
Joao Macedo
Joao Macedo
PhD Canndidate
Machine LearningComputer VisionCSAM clasification
Jefersson A. dos Santos
Jefersson A. dos Santos
University of Sheffield - School of Computer Science
Computer VisionMachine LearningRemote SensingGeoAI
Sandra Avila
Sandra Avila
Professor of Computer Science, University of Campinas (Unicamp)
Machine LearningDeep LearningComputer VisionNatural Language Processing