Short-PHD: Detecting Short LLM-generated Text with Topological Data Analysis After Off-topic Content Insertion

📅 2025-04-01

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

To address the challenge of reliably detecting short-text generations from large language models (LLMs), this paper proposes a zero-shot topological detection method. The approach controllably injects off-topic content into input texts prior to embedding, thereby stabilizing the topological structure of the resulting embedding space. It is the first to integrate off-topic injection with persistent homology dimension (PHD) analysis, significantly enhancing the robustness and discriminability of topological features for short texts and overcoming the length limitation inherent in zero-shot detection. Specifically, the method constructs point clouds from text embeddings, computes PHDs as discriminative features, and performs zero-shot classification via unsupervised thresholding. Evaluated on multiple public and synthetic short-text datasets, it achieves an average detection accuracy improvement of 12.6% over state-of-the-art zero-shot methods. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract

The malicious usage of large language models (LLMs) has motivated the detection of LLM-generated texts. Previous work in topological data analysis shows that the persistent homology dimension (PHD) of text embeddings can serve as a more robust and promising score than other zero-shot methods. However, effectively detecting short LLM-generated texts remains a challenge. This paper presents Short-PHD, a zero-shot LLM-generated text detection method tailored for short texts. Short-PHD stabilizes the estimation of the previous PHD method for short texts by inserting off-topic content before the given input text and identifies LLM-generated text based on an established detection threshold. Experimental results on both public and generated datasets demonstrate that Short-PHD outperforms existing zero-shot methods in short LLM-generated text detection. Implementation codes are available online.

Problem

Research questions and friction points this paper is trying to address.

Detecting short LLM-generated texts effectively

Stabilizing PHD estimation with off-topic insertion

Outperforming zero-shot methods in short text detection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses topological data analysis for detection

Inserts off-topic content to stabilize PHD

Zero-shot method tailored for short texts

🔎 Similar Papers

A Large Language Model Guided Topic Refinement Mechanism for Short Text Modeling