DAMASHA: Detecting AI in Mixed Adversarial Texts via Segmentation with Human-interpretable Attribution

📅 2025-12-04

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

This work addresses the precise segmentation of authorship transition points in hybrid-authored texts—those collaboratively generated by humans and AI. To tackle this challenge, we propose the Info-Mask framework and the Human–AI Attribution (HIA) interpretability mechanism, and introduce MAS, the first adversarial benchmark dataset specifically designed for this task. Our method jointly leverages stylistic features, perplexity signals, and structured boundary modeling to ensure both robustness and interpretability. Extensive evaluation—including cross-model comparisons and human user studies—demonstrates that our approach significantly improves segment-level segmentation robustness under adversarial perturbations, establishing a new state-of-the-art baseline. Moreover, this study is the first to systematically uncover critical limitations of existing hybrid-text segmentation methods: ambiguity in boundary localization, sensitivity to input perturbations, and inconsistency in attribution. We further provide concrete directions for future improvement, advancing both methodology and understanding in AI–human collaborative text analysis.

Technology Category

Application Category

📝 Abstract

In the age of advanced large language models (LLMs), the boundaries between human and AI-generated text are becoming increasingly blurred. We address the challenge of segmenting mixed-authorship text, that is identifying transition points in text where authorship shifts from human to AI or vice-versa, a problem with critical implications for authenticity, trust, and human oversight. We introduce a novel framework, called Info-Mask for mixed authorship detection that integrates stylometric cues, perplexity-driven signals, and structured boundary modeling to accurately segment collaborative human-AI content. To evaluate the robustness of our system against adversarial perturbations, we construct and release an adversarial benchmark dataset Mixed-text Adversarial setting for Segmentation (MAS), designed to probe the limits of existing detectors. Beyond segmentation accuracy, we introduce Human-Interpretable Attribution (HIA overlays that highlight how stylometric features inform boundary predictions, and we conduct a small-scale human study assessing their usefulness. Across multiple architectures, Info-Mask significantly improves span-level robustness under adversarial conditions, establishing new baselines while revealing remaining challenges. Our findings highlight both the promise and limitations of adversarially robust, interpretable mixed-authorship detection, with implications for trust and oversight in human-AI co-authorship.

Problem

Research questions and friction points this paper is trying to address.

Segmenting mixed human-AI authorship in text

Detecting transition points between human and AI content

Evaluating robustness against adversarial perturbations in detection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Segments text using stylometric and perplexity signals

Introduces adversarial benchmark dataset for robustness testing

Provides human-interpretable attribution overlays for predictions

🔎 Similar Papers

Is Contrasting All You Need? Contrastive Learning for the Detection and Attribution of AI-generated Text