TRAPDOC: Deceiving LLM Users by Injecting Imperceptible Phantom Tokens into Documents

📅 2025-05-30

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

To mitigate privacy and accountability risks arising from users’ overreliance on proprietary large language models (LLMs) for processing sensitive documents, this paper proposes a document-centric adversarial intervention method: implicitly embedding human-imperceptible yet model-perceivable “phantom tokens” into text. The approach implements input-layer perturbations using Unicode variant selectors and zero-width characters, requiring no access to model parameters and thus supporting black-box deployment. Through cross-model robustness optimization, it achieves over 86% misdirection success rates across mainstream LLMs—including GPT-4, Claude, and Gemini. Its core contribution is the first non-destructive, deployable warning mechanism that provokes user reflection by inducing models to generate outputs that appear plausible yet are semantically incorrect—while fully preserving original text readability and document integrity.

Technology Category

Application Category

📝 Abstract

The reasoning, writing, text-editing, and retrieval capabilities of proprietary large language models (LLMs) have advanced rapidly, providing users with an ever-expanding set of functionalities. However, this growing utility has also led to a serious societal concern: the over-reliance on LLMs. In particular, users increasingly delegate tasks such as homework, assignments, or the processing of sensitive documents to LLMs without meaningful engagement. This form of over-reliance and misuse is emerging as a significant social issue. In order to mitigate these issues, we propose a method injecting imperceptible phantom tokens into documents, which causes LLMs to generate outputs that appear plausible to users but are in fact incorrect. Based on this technique, we introduce TRAPDOC, a framework designed to deceive over-reliant LLM users. Through empirical evaluation, we demonstrate the effectiveness of our framework on proprietary LLMs, comparing its impact against several baselines. TRAPDOC serves as a strong foundation for promoting more responsible and thoughtful engagement with language models. Our code is available at https://github.com/jindong22/TrapDoc.

Problem

Research questions and friction points this paper is trying to address.

Mitigating over-reliance on LLMs via deceptive phantom tokens

Preventing misuse of LLMs in sensitive document processing

Promoting responsible user engagement with language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Injecting imperceptible phantom tokens into documents

Causing LLMs to generate plausible but incorrect outputs

Framework deceiving over-reliant LLM users effectively

🔎 Similar Papers

Lockpicking LLMs: A Logit-Based Jailbreak Using Token-level Manipulation