LLMs can hide text in other text of the same length.ipynb

📅 2025-10-22

📈 Citations: 0

✨ Influential: 0

career value

234K/year

🤖 AI Summary

Large language models (LLMs) can embed meaningful information within syntactically correct yet semantically irrelevant cover texts of equal length, causing a severe decoupling between surface-level text and the author’s true intent—posing critical risks to AI safety and communication integrity. This work proposes a lightweight text steganography protocol leveraging an open-source 8B-parameter LLM, integrating localized encoding and decoding techniques to achieve second-scale embedding and high-fidelity reconstruction on commodity hardware. The method operates on plain text without special tokens or formatting, rendering embedded payloads imperceptible to both human readers and current detection models. Key contributions include: (1) the first systematic demonstration of the fundamental dissociation between semantic content and pragmatic intent in LLM outputs; (2) establishment of an efficient, reproducible text steganography paradigm; and (3) empirical challenges to prevailing assumptions in AI safety and the conceptual framework of “model knowledge.”

Technology Category

Application Category

📝 Abstract

A meaningful text can be hidden inside another, completely different yet still coherent and plausible, text of the same length. For example, a tweet containing a harsh political critique could be embedded in a tweet that celebrates the same political leader, or an ordinary product review could conceal a secret manuscript. This uncanny state of affairs is now possible thanks to Large Language Models, and in this paper we present a simple and efficient protocol to achieve it. We show that even modest 8-billion-parameter open-source LLMs are sufficient to obtain high-quality results, and a message as long as this abstract can be encoded and decoded locally on a laptop in seconds. The existence of such a protocol demonstrates a radical decoupling of text from authorial intent, further eroding trust in written communication, already shaken by the rise of LLM chatbots. We illustrate this with a concrete scenario: a company could covertly deploy an unfiltered LLM by encoding its answers within the compliant responses of a safe model. This possibility raises urgent questions for AI safety and challenges our understanding of what it means for a Large Language Model to know something.

Problem

Research questions and friction points this paper is trying to address.

Hiding meaningful text within coherent same-length text

Demonstrating radical decoupling of text from authorial intent

Raising urgent AI safety questions about covert LLM deployment

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs hide text in same-length coherent content

Simple protocol enables efficient local encoding/decoding

Decouples text from authorial intent to bypass filters

🔎 Similar Papers

Can Watermarked LLMs be Identified by Users via Crafted Prompts?