Thunder-NUBench: A Benchmark for LLMs' Sentence-Level Negation Understanding

📅 2025-06-17

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

Existing benchmarks treat negation as a peripheral phenomenon in tasks like natural language inference, lacking dedicated evaluation for sentence-level negation understanding. Method: We introduce NegBench—the first structured benchmark explicitly designed to assess large language models’ (LLMs) sentence-level negation comprehension—covering semantic variants including standard negation, local negation, contradiction, and paraphrase. It employs human-annotated sentence pairs and multiple-choice discrimination tasks for fine-grained, semantically grounded evaluation. Contribution/Results: NegBench is the first framework to isolate negation understanding as a primary objective—not a byproduct—and introduces structured negative exemplar contrast. It provides a high-quality, reproducible, and interpretable quantitative evaluation framework. Experiments reveal systematic failures of mainstream LLMs in negation logic reasoning, establishing NegBench as a critical diagnostic and improvement tool for negation-aware model development.

Technology Category

Application Category

📝 Abstract

Negation is a fundamental linguistic phenomenon that poses persistent challenges for Large Language Models (LLMs), particularly in tasks requiring deep semantic understanding. Existing benchmarks often treat negation as a side case within broader tasks like natural language inference, resulting in a lack of benchmarks that exclusively target negation understanding. In this work, we introduce extbf{Thunder-NUBench}, a novel benchmark explicitly designed to assess sentence-level negation understanding in LLMs. Thunder-NUBench goes beyond surface-level cue detection by contrasting standard negation with structurally diverse alternatives such as local negation, contradiction, and paraphrase. The benchmark consists of manually curated sentence-negation pairs and a multiple-choice dataset that enables in-depth evaluation of models' negation understanding.

Problem

Research questions and friction points this paper is trying to address.

Assessing LLMs' sentence-level negation understanding

Addressing lack of negation-focused benchmarks

Evaluating deep semantic negation comprehension

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces Thunder-NUBench for negation understanding

Contrasts standard negation with diverse alternatives

Uses manually curated sentence-negation pairs

🔎 Similar Papers

No similar papers found.