🤖 AI Summary
To address privacy leakage risks arising from using real sensitive data in document image classification, this paper proposes a Conditional Latent Diffusion Model (CLDM) framework constrained by differential privacy (DP), enabling high-fidelity and high-utility private synthetic document image generation. Methodologically, the approach jointly models class and layout conditions, integrates the DP-Adam optimizer with DPDM/DP-Promise training mechanisms, and supports both pretraining and fine-grained private fine-tuning. Crucially, DP constraints are imposed directly in the latent space—ensuring strict ε-differential privacy (ε ∈ {1, 5, 10}) without compromising generation quality. Experiments on RVL-CDIP and Tobacco3482 demonstrate that the synthesized images exhibit strong visual realism and yield significantly superior downstream classification accuracy compared to baseline methods that apply DP directly to downstream models—particularly under low-data regimes.
📝 Abstract
As deep learning-based, data-driven information extraction systems become increasingly integrated into modern document processing workflows, one primary concern is the risk of malicious leakage of sensitive private data from these systems. While some recent works have explored Differential Privacy (DP) to mitigate these privacy risks, DP-based training is known to cause significant performance degradation and impose several limitations on standard training procedures, making its direct application to downstream tasks both difficult and costly. In this work, we aim to address the above challenges within the context of document image classification by substituting real private data with a synthetic counterpart. In particular, we propose to use conditional latent diffusion models (LDMs) in combination with differential privacy (DP) to generate class-specific synthetic document images under strict privacy constraints, which can then be utilized to train a downstream classifier following standard training procedures. We investigate our approach under various pretraining setups, including unconditional, class-conditional, and layout-conditional pretraining, in combination with multiple private training strategies such as class-conditional and per-label private fine-tuning with DPDM and DP-Promise algorithms. Additionally, we evaluate it on two well-known document benchmark datasets, RVL-CDIP and Tobacco3482, and show that it can generate useful and realistic document samples across various document types and privacy levels ($varepsilon in {1, 5, 10}$). Lastly, we show that our approach achieves substantial performance improvements in downstream evaluations on small-scale datasets, compared to the direct application of DP-Adam.