Mastering Negation: Boosting Grounding Models via Grouped Opposition-Based Learning

📅 2026-03-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge that existing vision-language grounding models struggle to accurately interpret and localize complex instructions containing negation due to a scarcity of high-quality negative semantic data. To this end, the authors introduce D-Negation, the first dataset comprising paired positive–negative semantic annotations, and propose a Grouped Contrastive Learning framework. This framework organizes semantically opposing descriptions in a structured manner, incorporates a complementary semantic contrastive loss, and leverages parameter-efficient fine-tuning—updating fewer than 10% of model parameters—to substantially enhance the model’s comprehension of negation. Experimental results demonstrate consistent improvements of 4.4 mAP and 5.7 mAP on positive and negative semantic grounding tasks, respectively, significantly boosting both robustness and accuracy.

Technology Category

Application Category

📝 Abstract
Current vision-language detection and grounding models predominantly focus on prompts with positive semantics and often struggle to accurately interpret and ground complex expressions containing negative semantics. A key reason for this limitation is the lack of high-quality training data that explicitly captures discriminative negative samples and negation-aware language descriptions. To address this challenge, we introduce D-Negation, a new dataset that provides objects annotated with both positive and negative semantic descriptions. Building upon the observation that negation reasoning frequently appears in natural language, we further propose a grouped opposition-based learning framework that learns negation-aware representations from limited samples. Specifically, our method organizes opposing semantic descriptions from D-Negation into structured groups and formulates two complementary loss functions that encourage the model to reason about negation and semantic qualifiers. We integrate the proposed dataset and learning strategy into a state-of-the-art language-based grounding model. By fine-tuning fewer than 10 percent of the model parameters, our approach achieves improvements of up to 4.4 mAP and 5.7 mAP on positive and negative semantic evaluations, respectively. These results demonstrate that explicitly modeling negation semantics can substantially enhance the robustness and localization accuracy of vision-language grounding models.
Problem

Research questions and friction points this paper is trying to address.

negation
vision-language grounding
negative semantics
object grounding
language-based detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

negation-aware learning
opposition-based learning
vision-language grounding
negative semantics
D-Negation dataset
🔎 Similar Papers
No similar papers found.
Z
Zesheng Yang
Department of Computer Science, Southern University of Science and Technology, Shenzhen, China
Xi Jiang
Xi Jiang
South University of Science and Technology
Computer VisionDeep Learning
Bingzhang Hu
Bingzhang Hu
Chinese Academy of Sciences
Computer VisionMachine Learning
W
Weili Guan
School of Information Science and Technology, Harbin Institute of Technology (Shenzhen), China
R
Runmin Cong
School of Control Science and Engineering, Shandong University, Jinan, China; Key Laboratory of Machine Intelligence and System Control, Ministry of Education, China
G
Guo-Jun Qi
Research Center for Industries of the Future and the School of Engineering, Westlake University, Hangzhou, China; OPPO Research, Seattle, WA, USA
Feng Zheng
Feng Zheng
Southern University of Science and Technology; Spatialtemporal AI
Embodied IntelligenceSpatialtemporal AIComputer Vision