A Content-Preserving Secure Linguistic Steganography

📅 2025-11-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing linguistic steganography methods rely on content rewriting, which introduces detectable statistical anomalies that compromise covertness. This paper proposes a novel **content-preserving linguistic steganography paradigm**: rather than altering any character in the original text, it embeds secret information solely by controllably modulating the output probability distribution of a masked language model (MLM). Our key contributions are: (1) a dynamic distribution encoding strategy coupled with an enhanced masking mechanism to precisely identify embedding positions; (2) fine-tuning of the MLM to construct a reversible steganographic model, enabling lossless mapping from plaintext to stegotext and achieving 100% accurate secret extraction; and (3) theoretical perfect secrecy, with empirical results demonstrating significant improvements over state-of-the-art methods in embedding capacity, text naturalness, and security.

Technology Category

Application Category

📝 Abstract
Existing linguistic steganography methods primarily rely on content transformations to conceal secret messages. However, they often cause subtle yet looking-innocent deviations between normal and stego texts, posing potential security risks in real-world applications. To address this challenge, we propose a content-preserving linguistic steganography paradigm for perfectly secure covert communication without modifying the cover text. Based on this paradigm, we introduce CLstega ( extit{C}ontent-preserving extit{L}inguistic extit{stega}nography), a novel method that embeds secret messages through controllable distribution transformation. CLstega first applies an augmented masking strategy to locate and mask embedding positions, where MLM(masked language model)-predicted probability distributions are easily adjustable for transformation. Subsequently, a dynamic distribution steganographic coding strategy is designed to encode secret messages by deriving target distributions from the original probability distributions. To achieve this transformation, CLstega elaborately selects target words for embedding positions as labels to construct a masked sentence dataset, which is used to fine-tune the original MLM, producing a target MLM capable of directly extracting secret messages from the cover text. This approach ensures perfect security of secret messages while fully preserving the integrity of the original cover text. Experimental results show that CLstega can achieve a 100% extraction success rate, and outperforms existing methods in security, effectively balancing embedding capacity and security.
Problem

Research questions and friction points this paper is trying to address.

Addresses security risks in linguistic steganography caused by text deviations
Preserves cover text integrity while embedding secret messages securely
Achieves perfect security through controllable distribution transformation methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Content-preserving embedding without modifying cover text
Controllable distribution transformation using masked language model
Dynamic steganographic coding for perfect security and integrity
🔎 Similar Papers
No similar papers found.
L
Lingyun Xiang
School of Computer Science and Technology, Changsha University of Science and Technology
C
Chengfu Ou
College of Cyberspace Security, Jinan University
X
Xu He
School of Computer Science and Technology, Changsha University of Science and Technology
Zhongliang Yang
Zhongliang Yang
Associate Professor, Beijing University of Posts and Telecommunications
AI SecurityFinTech
Y
Yuling Liu
College of Cyber Science and Technology, Hunan University