🤖 AI Summary
Amid escalating network censorship, balancing capacity, efficiency, and security in text steganography remains challenging. This paper proposes the first large language model (LLM)-based, statistically indistinguishable high-capacity text steganography framework. Our method models LLM output distributions to embed secret bits without degrading text quality, ensuring stego-texts are statistically indistinguishable from natural language. Key contributions include: (1) a probabilistic interval pseudo-random offset mechanism enabling private-bit-driven steganographic sampling; and (2) a decoding-oriented interval-splitting minimization reordering algorithm that significantly improves encoding/decoding efficiency. Experiments on mainstream LLMs demonstrate state-of-the-art performance: +38% embedding capacity, 2.1× faster encoding/decoding speed, and 100% decoding accuracy—resolving the long-standing trade-off among capacity, efficiency, and robustness in text steganography.
📝 Abstract
In the face of escalating surveillance and censorship within the cyberspace, the sanctity of personal privacy has come under siege, necessitating the development of steganography, which offers a way to securely hide messages within innocent-looking texts. Previous methods alternate the texts to hide private massages, which is not secure. Large Language Models (LLMs) provide high-quality and explicit distribution, which is an available mathematical tool for secure steganography methods. However, existing attempts fail to achieve high capacity, time efficiency and correctness simultaneously, and their strongly coupling designs leave little room for refining them to achieve better performance. To provide a secure, high-capacity and efficient steganography method, we introduce ShiMer. Specifically, ShiMer pseudorandomly shifts the probability interval of the LLM's distribution to obtain a private distribution, and samples a token according to the private bits. ShiMer produced steganographic texts are indistinguishable in quality from the normal texts directly generated by the language model. To further enhance the capacity of ShiMer, we design a reordering algorithm to minimize the occurrence of interval splitting during decoding phase. Experimental results indicate that our method achieves the highest capacity and efficiency among existing secure steganography techniques.