🤖 AI Summary
Existing AIGT detection datasets primarily focus on static text, failing to capture the user-driven, temporally evolving dynamics of AI-generated content prevalent in social media. To address this, we introduce RedNote-Vibe—the first five-year longitudinal dataset sourced from Xiaohongshu—comprising timestamped AI- and human-authored texts alongside fine-grained, time-stamped user interactions (e.g., likes, comments). We propose PLAD (Psycholinguistic and Longitudinal Analysis for Detection), an interpretable framework that uniquely integrates psycholinguistic feature modeling with temporal analysis of user engagement behavior. Experiments demonstrate that PLAD significantly outperforms state-of-the-art methods in AIGT detection accuracy while uncovering systematic temporal patterns in linguistic features and their associations with user participation metrics—particularly comment propensity. Both the RedNote-Vibe dataset and PLAD implementation are publicly released.
📝 Abstract
The proliferation of Large Language Models (LLMs) has led to widespread AI-Generated Text (AIGT) on social media platforms, creating unique challenges where content dynamics are driven by user engagement and evolve over time. However, existing datasets mainly depict static AIGT detection. In this work, we introduce RedNote-Vibe, the first longitudinal (5-years) dataset for social media AIGT analysis. This dataset is sourced from Xiaohongshu platform, containing user engagement metrics (e.g., likes, comments) and timestamps spanning from the pre-LLM period to July 2025, which enables research into the temporal dynamics and user interaction patterns of AIGT. Furthermore, to detect AIGT in the context of social media, we propose PsychoLinguistic AIGT Detection Framework (PLAD), an interpretable approach that leverages psycholinguistic features. Our experiments show that PLAD achieves superior detection performance and provides insights into the signatures distinguishing human and AI-generated content. More importantly, it reveals the complex relationship between these linguistic features and social media engagement. The dataset is available at https://github.com/testuser03158/RedNote-Vibe.