Unmasking the Imposters: How Censorship and Domain Adaptation Affect the Detection of Machine-Generated Tweets

📅 2024-06-25

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

This study addresses the misuse risk posed by large language models (LLMs) generating highly realistic, evasive text on social platforms like Twitter. We construct nine Twitter-specific datasets and systematically evaluate tweet-generation capabilities of Llama 3, Mistral, Qwen2, and GPT-4o under both enabled and disabled content moderation. We first reveal that disabling moderation critically undermines detectability of 7B/8B open-weight LLMs: mainstream AIGC detectors suffer >40% average accuracy degradation. Through multidimensional text quality analysis—including semantic similarity, lexical diversity, n-gram distribution, and syntactic complexity—alongside cross-model/cross-configuration detection benchmarking and domain-adaptive modeling, we find that while moderation reduces lexical diversity, it enhances structural regularity—paradoxically creating detection blind spots. This work fills a critical gap in understanding how content moderation interventions and domain adaptation jointly affect AIGC detection mechanisms.

Technology Category

Application Category

📝 Abstract

The rapid development of large language models (LLMs) has significantly improved the generation of fluent and convincing text, raising concerns about their potential misuse on social media platforms. We present a comprehensive methodology for creating nine Twitter datasets to examine the generative capabilities of four prominent LLMs: Llama 3, Mistral, Qwen2, and GPT4o. These datasets encompass four censored and five uncensored model configurations, including 7B and 8B parameter base-instruction models of the three open-source LLMs. Additionally, we perform a data quality analysis to assess the characteristics of textual outputs from human,"censored,"and"uncensored"models, employing semantic meaning, lexical richness, structural patterns, content characteristics, and detector performance metrics to identify differences and similarities. Our evaluation demonstrates that"uncensored"models significantly undermine the effectiveness of automated detection methods. This study addresses a critical gap by exploring smaller open-source models and the ramifications of"uncensoring,"providing valuable insights into how domain adaptation and content moderation strategies influence both the detectability and structural characteristics of machine-generated text.

Problem

Research questions and friction points this paper is trying to address.

Large Language Models

Social Media Misuse

Automated Detection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Models

Censored vs Uncensored Analysis

Machine-generated Text Detection

🔎 Similar Papers

Detecting AI-Generated Text: Factors Influencing Detectability with Current Methods