IPAD: Inverse Prompt for AI Detection -- A Robust and Explainable LLM-Generated Text Detector

📅 2025-02-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the poor robustness and limited interpretability of large language model (LLM) text detection under out-of-distribution (OOD) and adversarial settings, this paper proposes IPAD, an Inverse Prompt-based Detection framework. IPAD first reconstructs the latent original prompt of an input text via a Prompt Inverter—implemented either gradient-based or retrieval-based—then employs a dual-version Distinguisher to quantify semantic alignment between the reconstructed prompt and the text. This work introduces the novel “inverse prompt” paradigm, reframing detection as a consistency discrimination task, thereby achieving strong generalization and human-verifiable decision rationales. Experiments show that Distinguisher v2 improves F1 by 9.73% on in-distribution data and AUROC by 12.65% on OOD data. A user study further confirms that IPAD’s interpretable outputs significantly enhance detection credibility.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have attained human-level fluency in text generation, which complicates the distinguishing between human-written and LLM-generated texts. This increases the risk of misuse and highlights the need for reliable detectors. Yet, existing detectors exhibit poor robustness on out-of-distribution (OOD) data and attacked data, which is critical for real-world scenarios. Also, they struggle to provide explainable evidence to support their decisions, thus undermining the reliability. In light of these challenges, we propose IPAD (Inverse Prompt for AI Detection), a novel framework consisting of a Prompt Inverter that identifies predicted prompts that could have generated the input text, and a Distinguisher that examines how well the input texts align with the predicted prompts. We develop and examine two versions of Distinguishers. Empirical evaluations demonstrate that both Distinguishers perform significantly better than the baseline methods, with version2 outperforming baselines by 9.73% on in-distribution data (F1-score) and 12.65% on OOD data (AUROC). Furthermore, a user study is conducted to illustrate that IPAD enhances the AI detection trustworthiness by allowing users to directly examine the decision-making evidence, which provides interpretable support for its state-of-the-art detection results.
Problem

Research questions and friction points this paper is trying to address.

Detecting AI-generated text reliably
Improving robustness on OOD data
Providing explainable AI detection evidence
Innovation

Methods, ideas, or system contributions that make the work stand out.

Inverse Prompt AI Detection
Explainable LLM-Generated Detector
Robust Out-of-Distribution Performance
🔎 Similar Papers
No similar papers found.
Z
Zheng Chen
Hong Kong University of Science and Technology, Hong Kong
Y
Yushi Feng
The University of Hong Kong, Hong Kong
Changyang He
Changyang He
Max Planck Institute for Security and Privacy
Human Computer InteractionSocial ComputingResponsible AIHealth Informatics
Y
Yue Deng
Hong Kong University of Science and Technology, Hong Kong
H
Hongxi Pu
University of Michigan, United States
B
Bo Li
Hong Kong University of Science and Technology, Hong Kong