RedNote-Vibe: A Dataset for Capturing Temporal Dynamics of AI-Generated Text in Social Media

📅 2025-09-26

📈 Citations: 0

✨ Influential: 0

career value

165K/year

🤖 AI Summary

Existing AIGT detection datasets primarily focus on static text, failing to capture the user-driven, temporally evolving dynamics of AI-generated content prevalent in social media. To address this, we introduce RedNote-Vibe—the first five-year longitudinal dataset sourced from Xiaohongshu—comprising timestamped AI- and human-authored texts alongside fine-grained, time-stamped user interactions (e.g., likes, comments). We propose PLAD (Psycholinguistic and Longitudinal Analysis for Detection), an interpretable framework that uniquely integrates psycholinguistic feature modeling with temporal analysis of user engagement behavior. Experiments demonstrate that PLAD significantly outperforms state-of-the-art methods in AIGT detection accuracy while uncovering systematic temporal patterns in linguistic features and their associations with user participation metrics—particularly comment propensity. Both the RedNote-Vibe dataset and PLAD implementation are publicly released.

Technology Category

Application Category

📝 Abstract

The proliferation of Large Language Models (LLMs) has led to widespread AI-Generated Text (AIGT) on social media platforms, creating unique challenges where content dynamics are driven by user engagement and evolve over time. However, existing datasets mainly depict static AIGT detection. In this work, we introduce RedNote-Vibe, the first longitudinal (5-years) dataset for social media AIGT analysis. This dataset is sourced from Xiaohongshu platform, containing user engagement metrics (e.g., likes, comments) and timestamps spanning from the pre-LLM period to July 2025, which enables research into the temporal dynamics and user interaction patterns of AIGT. Furthermore, to detect AIGT in the context of social media, we propose PsychoLinguistic AIGT Detection Framework (PLAD), an interpretable approach that leverages psycholinguistic features. Our experiments show that PLAD achieves superior detection performance and provides insights into the signatures distinguishing human and AI-generated content. More importantly, it reveals the complex relationship between these linguistic features and social media engagement. The dataset is available at https://github.com/testuser03158/RedNote-Vibe.

Problem

Research questions and friction points this paper is trying to address.

Capturing temporal dynamics of AI-generated social media content

Analyzing user engagement patterns with AI-generated text over time

Developing interpretable detection methods using psycholinguistic features

Innovation

Methods, ideas, or system contributions that make the work stand out.

Longitudinal dataset captures five-year social media dynamics

Psycholinguistic framework detects AI-generated text interpretably

Analyzes linguistic features' relationship with user engagement

🔎 Similar Papers

Detecting AI-Generated Text: Factors Influencing Detectability with Current Methods