Fine-Grained Detection of AI-Generated Text Using Sentence-Level Segmentation

📅 2025-09-22

📈 Citations: 0

✨ Influential: 0

career value

154K/year

🤖 AI Summary

Traditional AI text detectors typically operate at the document level, rendering them ineffective against AI-generated content that has been mixed or edited to evade detection—thereby blurring the boundary between human- and AI-authored text. To address this, we propose the first sentence-level sequence labeling framework tailored for collaborative writing, enabling precise localization of transition points between human- and AI-generated segments. Our method integrates a pretrained Transformer encoder, a context-aware neural network, and a conditional random field (CRF) layer to jointly model semantic and syntactic features alongside label sequence dependencies. Evaluated on multiple public benchmarks, our approach significantly outperforms zero-shot detectors and state-of-the-art models, achieving high accuracy in identifying AI-generated fragments—even in fully collaborative texts. This work pioneers the application of fine-grained sequence labeling to AI text provenance, establishing a new paradigm for interpretable, spatially localized analysis of human-AI collaborative content.

Technology Category

Application Category

📝 Abstract

Generation of Artificial Intelligence (AI) texts in important works has become a common practice that can be used to misuse and abuse AI at various levels. Traditional AI detectors often rely on document-level classification, which struggles to identify AI content in hybrid or slightly edited texts designed to avoid detection, leading to concerns about the model's efficiency, which makes it hard to distinguish between human-written and AI-generated texts. A sentence-level sequence labeling model proposed to detect transitions between human- and AI-generated text, leveraging nuanced linguistic signals overlooked by document-level classifiers. By this method, detecting and segmenting AI and human-written text within a single document at the token-level granularity is achieved. Our model combines the state-of-the-art pre-trained Transformer models, incorporating Neural Networks (NN) and Conditional Random Fields (CRFs). This approach extends the power of transformers to extract semantic and syntactic patterns, and the neural network component to capture enhanced sequence-level representations, thereby improving the boundary predictions by the CRF layer, which enhances sequence recognition and further identification of the partition between Human- and AI-generated texts. The evaluation is performed on two publicly available benchmark datasets containing collaborative human and AI-generated texts. Our experimental comparisons are with zero-shot detectors and the existing state-of-the-art models, along with rigorous ablation studies to justify that this approach, in particular, can accurately detect the spans of AI texts in a completely collaborative text. All our source code and the processed datasets are available in our GitHub repository.

Problem

Research questions and friction points this paper is trying to address.

Detecting AI-generated text in hybrid documents using sentence-level segmentation

Identifying transitions between human-written and AI-generated content sequences

Improving detection accuracy for slightly edited AI texts avoiding classification

Innovation

Methods, ideas, or system contributions that make the work stand out.

Sentence-level sequence labeling model for AI detection

Combines Transformers with Neural Networks and CRF

Token-level granularity identifies human-AI text transitions

🔎 Similar Papers

Detecting AI-Generated Text: Factors Influencing Detectability with Current Methods