Do Schwartz Higher-Order Values Help Sentence-Level Human Value Detection? When Hard Gating Hurts

📅 2026-01-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates whether Schwartz’s higher-order value structure facilitates sentence-level human value detection and evaluates the efficacy of hard-gated mechanisms under stringent computational constraints. It compares directly supervised models against hard-gated hierarchical pipelines and cascaded architectures, augmented with low-cost enhancements such as lexicons, short-context features, and topic cues. Results indicate that higher-order value categories can be effectively learned from single sentences, achieving a Macro-F₁ of approximately 0.58 for the best bipolar pair. However, hard gating often degrades performance due to error propagation and suppressed recall. In contrast, label-level threshold tuning (+0.05) and lightweight ensembling (+0.02) consistently improve results. The findings suggest that while Schwartz’s higher-order structure provides a useful descriptive framework, it should not be imposed as a rigid architectural constraint.

Technology Category

Application Category

📝 Abstract
Sentence-level human value detection is typically framed as multi-label classification over Schwartz values, but it remains unclear whether Schwartz higher-order (HO) categories provide usable structure. We study this under a strict compute-frugal budget (single 8 GB GPU) on ValueEval'24 / ValuesML (74K English sentences). We compare (i) direct supervised transformers, (ii) HO$\rightarrow$values pipelines that enforce the hierarchy with hard masks, and (iii) Presence$\rightarrow$HO$\rightarrow$values cascades, alongside low-cost add-ons (lexica, short context, topics), label-wise threshold tuning, small instruction-tuned LLM baselines ($\le$10B), QLoRA, and simple ensembles. HO categories are learnable from single sentences (e.g., the easiest bipolar pair reaches Macro-$F_1\approx0.58$), but hard hierarchical gating is not a reliable win: it often reduces end-task Macro-$F_1$ via error compounding and recall suppression. In contrast, label-wise threshold tuning is a high-leverage knob (up to $+0.05$ Macro-$F_1$), and small transformer ensembles provide the most consistent additional gains (up to $+0.02$ Macro-$F_1$). Small LLMs lag behind supervised encoders as stand-alone systems, yet can contribute complementary errors in cross-family ensembles. Overall, HO structure is useful descriptively, but enforcing it with hard gates hurts sentence-level value detection; robust improvements come from calibration and lightweight ensembling.
Problem

Research questions and friction points this paper is trying to address.

Schwartz values
human value detection
higher-order categories
sentence-level classification
hierarchical structure
Innovation

Methods, ideas, or system contributions that make the work stand out.

hierarchical value detection
hard gating
label-wise threshold tuning
lightweight ensembling
compute-frugal NLP
🔎 Similar Papers
No similar papers found.
V
Víctor Yeste
PRHLT Research Center, Universitat Politècnica de València, Valencia, 46022, Spain; School of Science, Engineering and Design, Universidad Europea de Valencia, Valencia, 46010, Spain
Paolo Rosso
Paolo Rosso
Full Professor, Computer Science, Universitat Politècnica de València
Natural Language ProcessingFake News detectionHate Speech detectionIrony detectionArtificial Intelligence