🤖 AI Summary
Existing provably secure steganographic methods for large language models suffer from limited embedding capacity due to inefficient entropy utilization. This work proposes a novel steganographic scheme based on list decoding and suffix-matching mechanisms. By maintaining a candidate list that includes the correct secret message and integrating a randomized token replacement strategy inherent to large language models, the method theoretically guarantees security, correctness, and a non-trivial lower bound on embedding capacity, thereby overcoming the bottleneck imposed by low-entropy cover texts. Experimental evaluations across three mainstream large language models and seven baseline approaches demonstrate that the proposed method substantially enhances embedding capacity while preserving computational efficiency comparable to existing provably secure schemes.
📝 Abstract
Steganography embeds secret messages in seemingly innocuous carriers for covert communication under surveillance. Current Provably Secure Steganography (PSS) schemes based on language models can guarantee computational indistinguishability between the covertext and stegotext. However, achieving high embedding capacity remains a challenge for existing PSS. The inefficient entropy utilization renders them not well-suited for Large Language Models (LLMs), whose inherent low-entropy tendencies severely constrain feasible embedding capacity. To address this, we propose a provably secure steganography scheme with a theoretically proved high capacity. Our scheme is based on the concept of list decoding: it maintains a set of candidates that contain the correct secret message, instead of directly finding the correct message with more effort. This strategy fully utilizes the information content of the generated text, yielding higher capacity. To ensure the correctness of our scheme, we further introduce a suffix-matching mechanism to distinguish the correct secret message from the candidates. We provide theoretical proofs for both the security and correctness of our scheme, alongside a derivation of its theoretical capacity lower bound.
Our approach is plug-and-play, requiring only a direct replacement of the model's standard random sampling module. Experiments on three LLMs and seven PSS baselines demonstrate that our method achieves computational efficiency comparable to prior PSS schemes while delivering a substantial improvement in embedding capacity.