🤖 AI Summary
This study exposes the unintended leakage of sensitive attributes—such as age, gender, and race—from low-dimensional face embeddings (e.g., 40-dimensional ArcFace/VGGFace2 features). To address this, we propose the first systematic quantification framework that integrates causal inference, disentangled feature learning, and adversarial attribution analysis, enabling gradient- and input-perturbation-based溯源 of sensitive attribute leakage. We further design an interpretable attribution mechanism and a leakage intensity metric. Experiments across mainstream face recognition models show that sensitive attributes can be predicted with over 89% accuracy, confirming substantial privacy leakage. To mitigate this, we introduce a lightweight debiasing fine-tuning strategy: it reduces sensitive-attribute prediction accuracy by 42% while incurring less than 0.3% degradation in identity verification performance. Our work establishes a novel paradigm for privacy-preserving representation learning, balancing utility and fairness without architectural modification.