RedNote-Vibe: A Dataset for Capturing Temporal Dynamics of AI-Generated Text in Social Media

📅 2025-09-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing AIGT detection datasets primarily focus on static text, failing to capture the user-driven, temporally evolving dynamics of AI-generated content prevalent in social media. To address this, we introduce RedNote-Vibe—the first five-year longitudinal dataset sourced from Xiaohongshu—comprising timestamped AI- and human-authored texts alongside fine-grained, time-stamped user interactions (e.g., likes, comments). We propose PLAD (Psycholinguistic and Longitudinal Analysis for Detection), an interpretable framework that uniquely integrates psycholinguistic feature modeling with temporal analysis of user engagement behavior. Experiments demonstrate that PLAD significantly outperforms state-of-the-art methods in AIGT detection accuracy while uncovering systematic temporal patterns in linguistic features and their associations with user participation metrics—particularly comment propensity. Both the RedNote-Vibe dataset and PLAD implementation are publicly released.

Technology Category

Application Category

📝 Abstract
The proliferation of Large Language Models (LLMs) has led to widespread AI-Generated Text (AIGT) on social media platforms, creating unique challenges where content dynamics are driven by user engagement and evolve over time. However, existing datasets mainly depict static AIGT detection. In this work, we introduce RedNote-Vibe, the first longitudinal (5-years) dataset for social media AIGT analysis. This dataset is sourced from Xiaohongshu platform, containing user engagement metrics (e.g., likes, comments) and timestamps spanning from the pre-LLM period to July 2025, which enables research into the temporal dynamics and user interaction patterns of AIGT. Furthermore, to detect AIGT in the context of social media, we propose PsychoLinguistic AIGT Detection Framework (PLAD), an interpretable approach that leverages psycholinguistic features. Our experiments show that PLAD achieves superior detection performance and provides insights into the signatures distinguishing human and AI-generated content. More importantly, it reveals the complex relationship between these linguistic features and social media engagement. The dataset is available at https://github.com/testuser03158/RedNote-Vibe.
Problem

Research questions and friction points this paper is trying to address.

Capturing temporal dynamics of AI-generated social media content
Analyzing user engagement patterns with AI-generated text over time
Developing interpretable detection methods using psycholinguistic features
Innovation

Methods, ideas, or system contributions that make the work stand out.

Longitudinal dataset captures five-year social media dynamics
Psycholinguistic framework detects AI-generated text interpretably
Analyzes linguistic features' relationship with user engagement
🔎 Similar Papers
No similar papers found.
Y
Yudong Li
Tsinghua University
Y
Yufei Sun
Beijing University of Posts and Telecommunications
Y
Yuhan Yao
Beijing University of Posts and Telecommunications
P
Peiru Yang
Tsinghua University
W
Wanyue Li
Hong Kong Metropolitan University
J
Jiajun Zou
Tsinghua University
Yongfeng Huang
Yongfeng Huang
Phd Student, Chinese University of Hong Kong
Natural Language Processing
Linlin Shen
Linlin Shen
Shenzhen University
Deep LearningComputer VisionFacial Analysis/RecognitionMedical Image Analysis