A Character-based Diffusion Embedding Algorithm for Enhancing the Generation Quality of Generative Linguistic Steganographic Texts

📅 2025-05-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Generative linguistic steganography faces a fundamental challenge: low-quality stegotext due to limited generative capacity of existing models and conventional embedding algorithms that treat sensitive information (e.g., semantics, randomness) as noise—forcing selection of low-probability tokens and degrading semantic coherence and fluency. To address this, we propose the Character-level Diffusion Embedding Algorithm (CDEA), the first method to transform character-level statistical properties of sensitive information into constructive signals. CDEA integrates power-law-based token grouping with diffusion-inspired frequency modulation over candidate words, significantly increasing high-probability token selection. It synergistically combines character-level modeling with XLNet’s long-context understanding, preserving high extraction accuracy while substantially improving perceptual imperceptibility. Experiments demonstrate that CDEA consistently outperforms state-of-the-art methods across BLEU, perplexity, and human evaluation metrics, achieving unprecedented stegotext quality.

Technology Category

Application Category

📝 Abstract
Generating high-quality steganographic text is a fundamental challenge in the field of generative linguistic steganography. This challenge arises primarily from two aspects: firstly, the capabilities of existing models in text generation are limited; secondly, embedding algorithms fail to effectively mitigate the negative impacts of sensitive information's properties, such as semantic content or randomness. Specifically, to ensure that the recipient can accurately extract hidden information, embedding algorithms often have to consider selecting candidate words with relatively low probabilities. This phenomenon leads to a decrease in the number of high-probability candidate words and an increase in low-probability candidate words, thereby compromising the semantic coherence and logical fluency of the steganographic text and diminishing the overall quality of the generated steganographic material. To address this issue, this paper proposes a novel embedding algorithm, character-based diffusion embedding algorithm (CDEA). Unlike existing embedding algorithms that strive to eliminate the impact of sensitive information's properties on the generation process, CDEA leverages sensitive information's properties. It enhances the selection frequency of high-probability candidate words in the candidate pool based on general statistical properties at the character level and grouping methods based on power-law distributions, while reducing the selection frequency of low-probability candidate words in the candidate pool. Furthermore, to ensure the effective transformation of sensitive information in long sequences, we also introduce the XLNet model. Experimental results demonstrate that the combination of CDEA and XLNet significantly improves the quality of generated steganographic text, particularly in terms of perceptual-imperceptibility.
Problem

Research questions and friction points this paper is trying to address.

Improving steganographic text quality in generative linguistic steganography
Reducing low-probability word selection to enhance semantic coherence
Leveraging character-level properties to optimize candidate word selection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Character-based diffusion embedding algorithm enhances steganographic text quality
Leverages sensitive information properties to improve candidate word selection
Integrates XLNet for effective long-sequence sensitive information transformation
🔎 Similar Papers
No similar papers found.