🤖 AI Summary
Current AI-based semantic de-identification methods lack a systematic, quantifiable, and formally grounded evaluation framework with provable privacy guarantees. This work proposes “contrastive privacy”—a model- and modality-agnostic formal privacy definition that enables automated, annotation-free assessment by comparing the distances between de-identified samples and other instances from the original corpus in a semantic embedding space (e.g., CLIP). By introducing contrastive learning principles into privacy measurement for the first time, this approach supports quantitative validation of de-identification efficacy across modalities, without reliance on manual labels or specific de-identification mechanisms, and remains robust even under imperfect semantic metrics. Empirical evaluation across 34 image model combinations and 15 text models demonstrates its effectiveness, enabling both holistic success rate estimation and precise identification of failure cases.
📝 Abstract
To sanitize specific concepts from imagery and text, privacy mechanisms with formal guarantees are often eschewed in practice in favor of more intuitive techniques. AI-based sanitization is poised to grow in popularity because it can work with the semantics of natural language concepts; e.g., a prompt to "remove faces, clothing, and body shape". Many approaches exist commercially and as prior work. But, the evaluation of such approaches has been bespoke and without formal guarantees.
To fill this gap, we propose contrastive privacy, a formal definition of privacy that provides a systematic and quantitative test of sanitized media that has a semantic interpretation. It is independent of the model and mechanism used and operates across multiple media modalities. Contrastive privacy provides guarantees under ideal conditions; and we show how to operationalize the definition with imperfect measures of semantics, provided by models like CLIP, that can connect concepts latently. Notably, the algorithm contrasts sanitized media with other images from the same corpus to arrive at a determination; no manual labeling is involved.
In our experiments, we apply our privacy test to both images and text using frontier models: some generate concepts to sanitize and others perform the sanitization. With our test we quantify sanitization success across 34 combinations of models on images, and for 15 models on text. The approach not only quantifies success overall, it identifies specific failures from a sanitized corpus. Further, it is independent of the mechanism used for sanitization, whether by darkening pixels, blurring, or applying more advanced means of obfuscation.