Identity-Decoupled Anonymization for Visual Evidence in Multi-modal Retrieval-Augmented Generation

📅 2026-04-26

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

This work addresses the challenge of preserving privacy in multimodal retrieval-augmented generation systems, where retrieved images often contain identifiable facial information. Existing anonymization techniques either degrade non-identity visual semantics essential for downstream tasks or fail to provide rigorous privacy guarantees. To overcome this, the authors propose inserting a generative anonymization module between retrieval and generation stages. This module employs an identity-attribute disentangled variational encoder, a manifold-aware identity replacement sampler, and a conditional diffusion generator distilled into a latent consistency model, collectively replacing identity features while preserving structured attributes. By integrating an ensemble of identity recognizers with a hinge loss, the method achieves provable privacy protection. The approach effectively eliminates identity identifiability without compromising visual fidelity, thereby maintaining strong downstream task performance and supporting low-latency deployment.

Technology Category

Application Category

📝 Abstract

Multi-modal retrieval-augmented generation (MRAG) systems retrieve visual evidence from large image corpora to ground the responses of large multi-modal models, yet the retrieved images frequently contain human faces whose identities constitute sensitive personal information. Existing anonymization techniques that destroy the non-identity visual cues that downstream reasoning depends on or fail to provide principled privacy guarantees. We propose Identity-Decoupled MRAG, a framework that interposes a generative anonymization module between retrieval and generation. Our approach consists of three components: (i)a disentangled variational encoder that factorizes each face into an identity code and a spatially-structured attribute code, regularized by a mutual-information penalty and a gradient-based independence term; (ii)a manifold-aware rejection sampler that replaces the identity code with a synthetic one guaranteed to be both distinct from the original and realistic; and (iii)a conditional latent diffusion generator that synthesizes the anonymized face from the replacement identity and the preserved attributes, distilled into a latent consistency model for low-latency deployment. Privacy is enforced through a multi-oracle ensemble of face recognition models with a hinge-based loss that halts optimization once identity similarity drops below the impostor-regime threshold.

Problem

Research questions and friction points this paper is trying to address.

multi-modal retrieval-augmented generation

visual evidence

face anonymization

privacy preservation

identity information

Innovation

Methods, ideas, or system contributions that make the work stand out.

disentangled representation

generative anonymization

retrieval-augmented generation

latent diffusion model

privacy-preserving AI

🔎 Similar Papers

UniRAG: Universal Retrieval Augmentation for Large Vision Language Models

2024-05-16Citations: 2