🤖 AI Summary
Existing autoregressive text steganography methods suffer from slow generation, poor imperceptibility, weak robustness, and vulnerability to substitution attacks. This paper pioneers the integration of diffusion models into text steganography, proposing GTSD—a Generative Text Steganography via Diffusion. GTSD employs prompt mapping for semantic-guided conditional generation and batch mapping to enable dynamic manuscript selection and parallel sampling, jointly optimizing steganographic capacity, efficiency, and security. Evaluated on standard benchmarks, GTSD achieves low detectability (steganalysis F1-score = 0.21), accelerates generation by 3.2× over baselines, reduces substitution-attack success rate to 18.7%, improves BLEU-4 by 12.6%, and scales steganographic capacity linearly with prompt length and batch size. Comprehensive experiments demonstrate that GTSD outperforms all state-of-the-art methods across key metrics.
📝 Abstract
With the rapid development of deep learning, existing generative text steganography methods based on autoregressive models have achieved success. However, these autoregressive steganography approaches have certain limitations. Firstly, existing methods require encoding candidate words according to their output probability and generating each stego word one by one, which makes the generation process time-consuming. Secondly, encoding and selecting candidate words changes the sampling probabilities, resulting in poor imperceptibility of the stego text. Thirdly, existing methods have low robustness and cannot resist replacement attacks. To address these issues, we propose a generative text steganography method based on a diffusion model (GTSD), which improves generative speed, robustness, and imperceptibility while maintaining security. To be specific, a novel steganography scheme based on diffusion model is proposed to embed secret information through prompt mapping and batch mapping. The prompt mapping maps secret information into a conditional prompt to guide the pre-trained diffusion model generating batches of candidate sentences. The batch mapping selects stego text based on secret information from batches of candidate sentences. Extensive experiments show that the GTSD outperforms the SOTA method in terms of generative speed, robustness, and imperceptibility while maintaining comparable anti-steganalysis performance. Moreover, we verify that the GTSD has strong potential: embedding capacity is positively correlated with prompt capacity and model batch sizes while maintaining security.