The Moral Foundations Reddit Corpus

📅 2022-08-10

🏛️ arXiv.org

📈 Citations: 19

✨ Influential: 7

career value

177K/year

🤖 AI Summary

Existing moral sentiment identification methods heavily rely on manually annotated data, yet prior corpora are limited to Twitter and adopt overly simplistic, monolithic definitions of moral dimensions. Method: We construct the first large-scale, Reddit-based moral sentiment annotation corpus (16,123 comments across 12 subreddits), annotated by ≥3 expert annotators using an updated Moral Foundations Theory (MFT) framework covering eight core moral foundations. We introduce novel analytical dimensions—“thin morality” and “implicit/explicit morality”—and employ a multi-stage annotation protocol, cross-annotator validation, and cross-domain knowledge transfer to establish robust baselines. Contribution/Results: Experiments demonstrate strong generalization across subreddits and significant performance gains in cross-platform transfer (Reddit → Twitter), markedly enhancing model robustness. This work establishes a new benchmark for multi-source, fine-grained moral rhetoric analysis and provides foundational theoretical and empirical support for advancing moral sentiment modeling.

📝 Abstract

Moral framing and sentiment can affect a variety of online and offline behaviors, including donation, pro-environmental action, political engagement, and even participation in violent protests. Various computational methods in Natural Language Processing (NLP) have been used to detect moral sentiment from textual data, but in order to achieve better performances in such subjective tasks, large sets of hand-annotated training data are needed. Previous corpora annotated for moral sentiment have proven valuable, and have generated new insights both within NLP and across the social sciences, but have been limited to Twitter. To facilitate improving our understanding of the role of moral rhetoric, we present the Moral Foundations Reddit Corpus, a collection of 16,123 Reddit comments that have been curated from 12 distinct subreddits, hand-annotated by at least three trained annotators for 8 categories of moral sentiment (i.e., Care, Proportionality, Equality, Purity, Authority, Loyalty, Thin Morality, Implicit/Explicit Morality) based on the updated Moral Foundations Theory (MFT) framework. We use a range of methodologies to provide baseline moral-sentiment classification results for this new corpus, e.g., cross-domain classification and knowledge transfer.

Problem

Research questions and friction points this paper is trying to address.

Creating a moral sentiment dataset from Reddit comments

Evaluating LLM performance on moral sentiment classification

Addressing limitations of existing Twitter-based moral corpora

Innovation

Methods, ideas, or system contributions that make the work stand out.

Created Reddit corpus with moral sentiment annotations

Evaluated LLMs versus fine-tuned BERT models

Used updated Moral Foundations Theory framework

🔎 Similar Papers

A Survey on Moral Foundation Theory and Pre-Trained Language Models: Current Advances and Challenges