Breaking Bad: Norms for Valence, Arousal, and Dominance for over 10k English Multiword Expressions

📅 2025-11-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing sentiment lexicons (e.g., NRC VAD) provide only word-level Valence, Arousal, and Dominance (VAD) scores, lacking systematic affective annotations for English multi-word expressions (MWEs) and their constituent words—limiting phrase-level sentiment analysis and cross-disciplinary research. To address this, we construct the first large-scale, human-annotated MWE VAD lexicon, comprising 10,000 MWEs and 25,000 words, with high-reliability ratings collected via psycholinguistic protocols. Structural validity is confirmed through factor analysis. NRC VAD v2—the resulting resource—introduces the first fine-grained, three-dimensional affective quantification of MWEs, uncovering patterns in emotional intensity and compositionality. It substantially extends the granularity of sentiment lexicons beyond the lexical level, enabling precise phrase-level affective modeling. As a publicly available, high-fidelity, phrase-level affective resource, NRC VAD v2 advances research in NLP, psychology, and cognitive science.

Technology Category

Application Category

📝 Abstract
Factor analysis studies have shown that the primary dimensions of word meaning are Valence (V), Arousal (A), and Dominance (D). Existing lexicons such as the NRC VAD Lexicon, published in 2018, include VAD association ratings for words. Here, we present a complement to it, which has human ratings of valence, arousal, and dominance for 10k English Multiword Expressions (MWEs) and their constituent words. We also increase the coverage of unigrams, especially words that have become more common since 2018. In all, the new NRC VAD Lexicon v2 now has entries for 10k MWEs and 25k words, in addition to the entries in v1. We show that the associations are highly reliable. We use the lexicon to examine emotional characteristics of MWEs, including: 1. The degree to which MWEs (idioms, noun compounds, and verb particle constructions) exhibit strong emotionality; 2. The degree of emotional compositionality in MWEs. The lexicon enables a wide variety of research in NLP, Psychology, Public Health, Digital Humanities, and Social Sciences. The NRC VAD Lexicon v2 is freely available through the project webpage: http://saifmohammad.com/WebPages/nrc-vad.html
Problem

Research questions and friction points this paper is trying to address.

Creating emotional ratings for 10k English multiword expressions
Expanding lexicon coverage with 25k additional unigram entries
Analyzing emotional characteristics and compositionality of MWEs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Expanded lexicon with 10k multiword expression ratings
Added 25k new unigram entries for broader coverage
Provided reliable valence arousal dominance human annotations
🔎 Similar Papers
No similar papers found.