🤖 AI Summary
This work addresses the research gap in adapting vision autoregressive models (VARs) to downstream tasks—particularly medical image generation—under differential privacy (DP) constraints. Unlike diffusion models (DMs), which benefit from mature adaptation techniques, VARs lack established DP-compliant fine-tuning methods. We systematically design and evaluate multiple DP-VAR adaptation strategies, including DP-SGD fine-tuning, DP-aware transfer learning, and optimized noise injection into latent representations. Experiments on large-scale benchmarks reveal that while VARs outperform DMs in non-private settings, their performance degrades substantially under DP, exposing heightened sensitivity to gradient perturbation—a previously uncharacterized challenge. Our study bridges the theoretical and practical void in DP-adapted VARs, providing the first reproducible framework for privacy-preserving medical AI and identifying critical directions for robustness improvement.
📝 Abstract
Vision AutoRegressive model (VAR) was recently introduced as an alternative to Diffusion Models (DMs) in image generation domain. In this work we focus on its adaptations, which aim to fine-tune pre-trained models to perform specific downstream tasks, like medical data generation. While for DMs there exist many techniques, adaptations for VAR remain underexplored. Similarly, differentially private (DP) adaptations-ones that aim to preserve privacy of the adaptation data-have been extensively studied for DMs, while VAR lacks such solutions. In our work, we implement and benchmark many strategies for VAR, and compare them to state-of-the-art DM adaptation strategies. We observe that VAR outperforms DMs for non-DP adaptations, however, the performance of DP suffers, which necessitates further research in private adaptations for VAR. Code is available at https://github.com/sprintml/finetuning_var_dp.