Thunder-NUBench: A Benchmark for LLMs' Sentence-Level Negation Understanding

📅 2025-06-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing benchmarks treat negation as a peripheral phenomenon in tasks like natural language inference, lacking dedicated evaluation for sentence-level negation understanding. Method: We introduce NegBench—the first structured benchmark explicitly designed to assess large language models’ (LLMs) sentence-level negation comprehension—covering semantic variants including standard negation, local negation, contradiction, and paraphrase. It employs human-annotated sentence pairs and multiple-choice discrimination tasks for fine-grained, semantically grounded evaluation. Contribution/Results: NegBench is the first framework to isolate negation understanding as a primary objective—not a byproduct—and introduces structured negative exemplar contrast. It provides a high-quality, reproducible, and interpretable quantitative evaluation framework. Experiments reveal systematic failures of mainstream LLMs in negation logic reasoning, establishing NegBench as a critical diagnostic and improvement tool for negation-aware model development.

Technology Category

Application Category

📝 Abstract
Negation is a fundamental linguistic phenomenon that poses persistent challenges for Large Language Models (LLMs), particularly in tasks requiring deep semantic understanding. Existing benchmarks often treat negation as a side case within broader tasks like natural language inference, resulting in a lack of benchmarks that exclusively target negation understanding. In this work, we introduce extbf{Thunder-NUBench}, a novel benchmark explicitly designed to assess sentence-level negation understanding in LLMs. Thunder-NUBench goes beyond surface-level cue detection by contrasting standard negation with structurally diverse alternatives such as local negation, contradiction, and paraphrase. The benchmark consists of manually curated sentence-negation pairs and a multiple-choice dataset that enables in-depth evaluation of models' negation understanding.
Problem

Research questions and friction points this paper is trying to address.

Assessing LLMs' sentence-level negation understanding
Addressing lack of negation-focused benchmarks
Evaluating deep semantic negation comprehension
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces Thunder-NUBench for negation understanding
Contrasts standard negation with diverse alternatives
Uses manually curated sentence-negation pairs
🔎 Similar Papers
No similar papers found.
Y
Yeonkyoung So
Graduate School of Data Science, Seoul National University
Gyuseong Lee
Gyuseong Lee
LG Electronics
Computer VisionMachine Learning
S
Sungmok Jung
Graduate School of Data Science, Seoul National University
J
Joonhak Lee
Graduate School of Data Science, Seoul National University
J
JiA Kang
Graduate School of Data Science, Seoul National University
Sangho Kim
Sangho Kim
Associate Professor of Biomedical Engineering, National University of Singapore
Blood RheologyMicrocirculationHemodynamicsGas Transport
J
Jaejin Lee
Graduate School of Data Science, Seoul National University