🤖 AI Summary
Training child sexual abuse imagery (CSAI) detection models is severely hindered by the extreme sensitivity of real CSAI data, precluding direct access to authentic samples for model development. Method: This paper proposes a privacy-first proxy-task modeling paradigm. We formally define “proxy tasks” and co-design alternative learning objectives with law enforcement agencies (LEA-in-the-loop); few-shot indoor scene classification serves as a representative proxy task, enabling knowledge transfer via cross-domain adaptation and few-shot learning. Crucially, all model weights are trained exclusively on non-sensitive proxy data—never exposed to any CSAI sample—thereby eliminating data leakage risks. Contribution/Results: Evaluated on real-world CSAI datasets, our approach achieves competitive detection performance. It establishes the first CSAI detection framework that simultaneously ensures rigorous privacy compliance and practical efficacy, marking a substantive breakthrough in zero-sensitive-data training for forensic AI applications.
📝 Abstract
The distribution of child sexual abuse imagery (CSAI) is an ever-growing concern of our modern world; children who suffered from this heinous crime are revictimized, and the growing amount of illegal imagery distributed overwhelms law enforcement agents (LEAs) with the manual labor of categorization. To ease this burden researchers have explored methods for automating data triage and detection of CSAI, but the sensitive nature of the data imposes restricted access and minimal interaction between real data and learning algorithms, avoiding leaks at all costs. In observing how these restrictions have shaped the literature we formalize a definition of"Proxy Tasks", i.e., the substitute tasks used for training models for CSAI without making use of CSA data. Under this new terminology we review current literature and present a protocol for making conscious use of Proxy Tasks together with consistent input from LEAs to design better automation in this field. Finally, we apply this protocol to study -- for the first time -- the task of Few-shot Indoor Scene Classification on CSAI, showing a final model that achieves promising results on a real-world CSAI dataset whilst having no weights actually trained on sensitive data.