A high-capacity linguistic steganography based on entropy-driven rank-token mapping

📅 2025-10-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing linguistic steganography methods face fundamental trade-offs between capacity and security: modification-based approaches are vulnerable to detection, retrieval-based strategies suffer from limited capacity, and generative methods are constrained by low token-prediction entropy. This paper proposes RTMStega, a novel framework introducing an entropy-driven rank-token mapping mechanism. It integrates rank-based adaptive encoding, normalized entropy-guided dynamic sampling, and context-aware decompression to突破 the capacity bottleneck while preserving textual naturalness. Unlike prior methods, RTMStega neither relies on explicit text modification nor fixed vocabularies; instead, it leverages the inherent ordinal structure of large language model (LLM) logits distributions to embed high-entropy information. Experiments demonstrate that RTMStega achieves three times the steganographic capacity of state-of-the-art baselines, improves inference speed by over 50%, and maintains superior text quality and robustness against steganalysis across multiple datasets and LLMs.

Technology Category

Application Category

📝 Abstract
Linguistic steganography enables covert communication through embedding secret messages into innocuous texts; however, current methods face critical limitations in payload capacity and security. Traditional modification-based methods introduce detectable anomalies, while retrieval-based strategies suffer from low embedding capacity. Modern generative steganography leverages language models to generate natural stego text but struggles with limited entropy in token predictions, further constraining capacity. To address these issues, we propose an entropy-driven framework called RTMStega that integrates rank-based adaptive coding and context-aware decompression with normalized entropy. By mapping secret messages to token probability ranks and dynamically adjusting sampling via context-aware entropy-based adjustments, RTMStega achieves a balance between payload capacity and imperceptibility. Experiments across diverse datasets and models demonstrate that RTMStega triples the payload capacity of mainstream generative steganography, reduces processing time by over 50%, and maintains high text quality, offering a trustworthy solution for secure and efficient covert communication.
Problem

Research questions and friction points this paper is trying to address.

Enhancing payload capacity in linguistic steganography methods
Reducing detectable anomalies in generated stego text
Overcoming limited entropy constraints in token predictions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses rank-based adaptive coding for token mapping
Implements context-aware entropy-driven decompression
Dynamically adjusts sampling via normalized entropy
🔎 Similar Papers
No similar papers found.
J
Jun Jiang
School of Cyber Science and Technology, University of Science and Technology of China, Hefei, Anhui, China
W
Weiming Zhang
School of Cyber Science and Technology, University of Science and Technology of China, Hefei, Anhui, China
Nenghai Yu
Nenghai Yu
University of Science and Technology of China
Computer VisionArtificial IntelligenceInformation Hiding
Kejiang Chen
Kejiang Chen
Department of Electronic Engineering and Information Science, University of Science and Technology
information hiding,steganography,privacy-preserving