Learn-to-Distance: Distance Learning for Detecting LLM-Generated Text

📅 2026-01-29
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes a geometry-aware detection method based on a rewriting mechanism to address the risks of misinformation and academic integrity violations posed by large language model (LLM)-generated text. The approach introduces adaptive distance learning into a rewriting-based detection framework for the first time, dynamically learning the geometric distance between original and rewritten texts to significantly enhance detection performance. Theoretical analysis demonstrates the superiority of this adaptive strategy over fixed-distance metrics and elucidates its generalization mechanism. Extensive experiments across more than 100 configurations involving multiple mainstream LLMs show that the proposed method substantially outperforms existing baselines, achieving relative performance improvements ranging from 57.8% to 80.6% across different target models.

Technology Category

Application Category

📝 Abstract
Modern large language models (LLMs) such as GPT, Claude, and Gemini have transformed the way we learn, work, and communicate. Yet, their ability to produce highly human-like text raises serious concerns about misinformation and academic integrity, making it an urgent need for reliable algorithms to detect LLM-generated content. In this paper, we start by presenting a geometric approach to demystify rewrite-based detection algorithms, revealing their underlying rationale and demonstrating their generalization ability. Building on this insight, we introduce a novel rewrite-based detection algorithm that adaptively learns the distance between the original and rewritten text. Theoretically, we demonstrate that employing an adaptively learned distance function is more effective for detection than using a fixed distance. Empirically, we conduct extensive experiments with over 100 settings, and find that our approach demonstrates superior performance over baseline algorithms in the majority of scenarios. In particular, it achieves relative improvements from 57.8\% to 80.6\% over the strongest baseline across different target LLMs (e.g., GPT, Claude, and Gemini).
Problem

Research questions and friction points this paper is trying to address.

LLM-generated text detection
distance learning
text authenticity
misinformation
academic integrity
Innovation

Methods, ideas, or system contributions that make the work stand out.

distance learning
LLM-generated text detection
adaptive distance function
rewrite-based detection
geometric analysis
🔎 Similar Papers
No similar papers found.