🤖 AI Summary
Interdisciplinary researchers face significant challenges in accessing multidimensionally annotated textual data. Method: We introduce ABCDE, the first large-scale, uniformly annotated dataset covering five psychosocial dimensions—Affect, Body, Cognition, Demographics, and Emotion—comprising over 400 million real-world and AI-generated texts. Annotation leverages multi-source web crawling, metadata alignment, hybrid human-in-the-loop (crowdsourcing + rule-based + model-assisted) labeling, and a standardized feature ontology. Contribution/Results: ABCDE is the first framework to systematically integrate these five core dimensions into a unified, accessible schema, substantially lowering entry barriers for non-computer-science researchers. Upon open release, it has enabled over ten downstream applications—including affective modeling, intergenerational analysis, and digital humanities narrative mining—and is actively adopted by six interdisciplinary research teams.
📝 Abstract
Work in Computational Affective Science and Computational Social Science explores a wide variety of research questions about people, emotions, behavior, and health. Such work often relies on language data that is first labeled with relevant information, such as the use of emotion words or the age of the speaker. Although many resources and algorithms exist to enable this type of labeling, discovering, accessing, and using them remains a substantial impediment, particularly for practitioners outside of computer science. Here, we present the ABCDE dataset (Affect, Body, Cognition, Demographics, and Emotion), a large-scale collection of over 400 million text utterances drawn from social media, blogs, books, and AI-generated sources. The dataset is annotated with a wide range of features relevant to computational affective and social science. ABCDE facilitates interdisciplinary research across numerous fields, including affective science, cognitive science, the digital humanities, sociology, political science, and computational linguistics.