🤖 AI Summary
To mitigate privacy and accountability risks arising from users’ overreliance on proprietary large language models (LLMs) for processing sensitive documents, this paper proposes a document-centric adversarial intervention method: implicitly embedding human-imperceptible yet model-perceivable “phantom tokens” into text. The approach implements input-layer perturbations using Unicode variant selectors and zero-width characters, requiring no access to model parameters and thus supporting black-box deployment. Through cross-model robustness optimization, it achieves over 86% misdirection success rates across mainstream LLMs—including GPT-4, Claude, and Gemini. Its core contribution is the first non-destructive, deployable warning mechanism that provokes user reflection by inducing models to generate outputs that appear plausible yet are semantically incorrect—while fully preserving original text readability and document integrity.
📝 Abstract
The reasoning, writing, text-editing, and retrieval capabilities of proprietary large language models (LLMs) have advanced rapidly, providing users with an ever-expanding set of functionalities. However, this growing utility has also led to a serious societal concern: the over-reliance on LLMs. In particular, users increasingly delegate tasks such as homework, assignments, or the processing of sensitive documents to LLMs without meaningful engagement. This form of over-reliance and misuse is emerging as a significant social issue. In order to mitigate these issues, we propose a method injecting imperceptible phantom tokens into documents, which causes LLMs to generate outputs that appear plausible to users but are in fact incorrect. Based on this technique, we introduce TRAPDOC, a framework designed to deceive over-reliant LLM users. Through empirical evaluation, we demonstrate the effectiveness of our framework on proprietary LLMs, comparing its impact against several baselines. TRAPDOC serves as a strong foundation for promoting more responsible and thoughtful engagement with language models. Our code is available at https://github.com/jindong22/TrapDoc.