π€ AI Summary
Existing datasets struggle to support the understanding of human values in news texts that are event-centric, tied to specific agents, and directionally oriented. To address this gap, this work proposes NEVUβthe first benchmark for news-based, agent-conditioned, multi-granular, and direction-aware value understanding. Leveraging 2,865 English-language news articles, the authors construct a large-scale dataset comprising 45,793 (semantic unit, agent) pairs and 168,061 directed value instances, using LLM-assisted annotation followed by rigorous human validation. A hierarchical taxonomy is defined, encompassing 54 fine-grained values organized under 20 coarse-grained categories. Experimental results demonstrate that lightweight LoRA fine-tuning substantially enhances the performance of open-source large language models on this challenging task.
π Abstract
Existing human value datasets do not directly support value understanding in factual news: many are actor-agnostic, rely on isolated utterances or synthetic scenarios, and lack explicit event structure or value direction. We present \textbf{NEVU} (\textbf{N}ews \textbf{E}vent-centric \textbf{V}alue \textbf{U}nderstanding), a benchmark for \emph{actor-conditioned}, \emph{event-centric}, and \emph{direction-aware} human value recognition in factual news. NEVU evaluates whether models can identify value cues, attribute them to the correct actor, and determine value direction from grounded evidence. Built from 2{,}865 English news articles, NEVU organizes annotations at four semantic unit levels (\textbf{Subevent}, \textbf{behavior-based composite event}, \textbf{story-based composite event}, and \textbf{Article}) and labels \mbox{(unit, actor)} pairs for fine-grained evaluation across local and composite contexts. The annotations are produced through an LLM-assisted pipeline with staged verification and targeted human auditing. Using a hierarchical value space with \textbf{54} fine-grained values and \textbf{20} coarse-grained categories, NEVU covers 45{,}793 unit--actor pairs and 168{,}061 directed value instances. We provide unified baselines for proprietary and open-source LLMs, and find that lightweight adaptation (LoRA) consistently improves open-source models, showing that although NEVU is designed primarily as a benchmark, it also supports supervised adaptation beyond prompting-only evaluation. Data availability is described in Appendix~\ref{app:data_code_availability}.