🤖 AI Summary
Existing text-guided image-to-image (I2I) steganography methods rely on model fine-tuning and struggle to simultaneously ensure structural invisibility and semantic consistency.
Method: This paper proposes a training-free framework for generating optical-illusion steganographic images, built entirely upon pretrained text-to-image diffusion models. It introduces a novel frequency-domain phase migration mechanism: dynamically and progressively transferring the phase spectrum during the diffusion process, coupled with an asynchronous phase control strategy to seamlessly embed structural priors from a reference image into the text-described scene. The approach integrates phase-spectrum migration, training-free I2I translation, and text-conditioned feature alignment.
Contribution/Results: Experiments demonstrate state-of-the-art performance across image quality, text fidelity, visual imperceptibility, and scene naturalness—surpassing all existing zero-shot steganography methods. To our knowledge, this is the first method achieving optimal hallucinatory steganographic synthesis without any training.
📝 Abstract
Optical illusion hidden picture is an interesting visual perceptual phenomenon where an image is cleverly integrated into another picture in a way that is not immediately obvious to the viewer. Established on the off-the-shelf text-to-image (T2I) diffusion model, we propose a novel training-free text-guided image-to-image (I2I) translation framework dubbed as extbf{P}hase- extbf{T}ransferred extbf{Diffusion} Model (PTDiffusion) for hidden art syntheses. PTDiffusion embeds an input reference image into arbitrary scenes as described by the text prompts, while exhibiting hidden visual cues of the reference image. At the heart of our method is a plug-and-play phase transfer mechanism that dynamically and progressively transplants diffusion features' phase spectrum from the denoising process to reconstruct the reference image into the one to sample the generated illusion image, realizing harmonious fusion of the reference structural information and the textual semantic information. Furthermore, we propose asynchronous phase transfer to enable flexible control to the degree of hidden content discernability. Our method bypasses any model training and fine-tuning, all while substantially outperforming related methods in image quality, text fidelity, visual discernibility, and contextual naturalness for illusion picture synthesis, as demonstrated by extensive qualitative and quantitative experiments.