🤖 AI Summary
This paper addresses the low extraction accuracy, poor fidelity, and insufficient hiding efficiency of neural steganography in diffusion models, proposing a decoupled steganographic embedding and extraction framework based on score-function editing. Methodologically, it performs gradient-driven low-rank adaptation (LoRA) fine-tuning of the learned score function at critical timesteps of the reverse denoising process to enable precise steganographic embedding, while supporting independent, lossless extraction by multiple receivers. Key contributions include: (i) the first realization of high-fidelity (imperceptible distortion to human vision) and high-efficiency (embedding speed accelerated by several orders of magnitude) image steganography within diffusion models; (ii) a parameter-efficient fine-tuning strategy inherently compatible with multi-receiver settings; and (iii) strict preservation of the original diffusion process behavior—both at the sample level and across population-level statistics.
📝 Abstract
Hiding data using neural networks (i.e., neural steganography) has achieved remarkable success across both discriminative classifiers and generative adversarial networks. However, the potential of data hiding in diffusion models remains relatively unexplored. Current methods exhibit limitations in achieving high extraction accuracy, model fidelity, and hiding efficiency due primarily to the entanglement of the hiding and extraction processes with multiple denoising diffusion steps. To address these, we describe a simple yet effective approach that embeds images at specific timesteps in the reverse diffusion process by editing the learned score functions. Additionally, we introduce a parameter-efficient fine-tuning method that combines gradient-based parameter selection with low-rank adaptation to enhance model fidelity and hiding efficiency. Comprehensive experiments demonstrate that our method extracts high-quality images at human-indistinguishable levels, replicates the original model behaviors at both sample and population levels, and embeds images orders of magnitude faster than prior methods. Besides, our method naturally supports multi-recipient scenarios through independent extraction channels.