The Moral Foundations Reddit Corpus

📅 2022-08-10
🏛️ arXiv.org
📈 Citations: 19
Influential: 7
📄 PDF
🤖 AI Summary
Existing moral sentiment identification methods heavily rely on manually annotated data, yet prior corpora are limited to Twitter and adopt overly simplistic, monolithic definitions of moral dimensions. Method: We construct the first large-scale, Reddit-based moral sentiment annotation corpus (16,123 comments across 12 subreddits), annotated by ≥3 expert annotators using an updated Moral Foundations Theory (MFT) framework covering eight core moral foundations. We introduce novel analytical dimensions—“thin morality” and “implicit/explicit morality”—and employ a multi-stage annotation protocol, cross-annotator validation, and cross-domain knowledge transfer to establish robust baselines. Contribution/Results: Experiments demonstrate strong generalization across subreddits and significant performance gains in cross-platform transfer (Reddit → Twitter), markedly enhancing model robustness. This work establishes a new benchmark for multi-source, fine-grained moral rhetoric analysis and provides foundational theoretical and empirical support for advancing moral sentiment modeling.
📝 Abstract
Moral framing and sentiment can affect a variety of online and offline behaviors, including donation, pro-environmental action, political engagement, and even participation in violent protests. Various computational methods in Natural Language Processing (NLP) have been used to detect moral sentiment from textual data, but in order to achieve better performances in such subjective tasks, large sets of hand-annotated training data are needed. Previous corpora annotated for moral sentiment have proven valuable, and have generated new insights both within NLP and across the social sciences, but have been limited to Twitter. To facilitate improving our understanding of the role of moral rhetoric, we present the Moral Foundations Reddit Corpus, a collection of 16,123 Reddit comments that have been curated from 12 distinct subreddits, hand-annotated by at least three trained annotators for 8 categories of moral sentiment (i.e., Care, Proportionality, Equality, Purity, Authority, Loyalty, Thin Morality, Implicit/Explicit Morality) based on the updated Moral Foundations Theory (MFT) framework. We use a range of methodologies to provide baseline moral-sentiment classification results for this new corpus, e.g., cross-domain classification and knowledge transfer.
Problem

Research questions and friction points this paper is trying to address.

Creating a moral sentiment dataset from Reddit comments
Evaluating LLM performance on moral sentiment classification
Addressing limitations of existing Twitter-based moral corpora
Innovation

Methods, ideas, or system contributions that make the work stand out.

Created Reddit corpus with moral sentiment annotations
Evaluated LLMs versus fine-tuned BERT models
Used updated Moral Foundations Theory framework
🔎 Similar Papers
No similar papers found.
Jackson Trager
Jackson Trager
Phd Candidate, University of Southern California
Moral PsychologyCultural ConflictTech and SocietyPolarization & HateAI Ethics & Policy
Alireza S. Ziabari
Alireza S. Ziabari
University of Southern California
Natural Language ProcessingMachine Learning
A
A. Davani
Google
P
Preni Golazazian
University of Southern California
Farzan Karimi-Malekabadi
Farzan Karimi-Malekabadi
University of Southern California
MoralityCultureLarge Language Models
Ali Omrani
Ali Omrani
Snap Inc.
Z
Zhihe Li
University of Southern California
Brendan Kennedy
Brendan Kennedy
Professor of Chemistry, The University of Sydney
CrystallographyInorganic ChemistryStructural Phase Tranitions
N
N. K. Reimer
University of Southern California
M
M. Reyes
University of Southern California
K
Kelsey Cheng
University of Southern California
M
Mellow Wei
University of Southern California
C
Christina Merrifield
University of Southern California
A
Arta Khosravi
University of Southern California
E
E. Álvarez
University of Southern California
M
Morteza Dehghani
University of Southern California