Multi-label Instance-level Generalised Visual Grounding in Agriculture

📅 2026-03-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the absence of a visual grounding benchmark in agricultural settings that supports negative example representation, which hinders existing models from accurately localizing crops and weeds in field images characterized by high visual similarity, multi-scale targets, and potential target absence. To bridge this gap, the authors introduce gRef-CW, the first general-purpose agricultural visual grounding dataset, along with Weed-VG, a novel framework that achieves fine-grained instance localization through multi-label hierarchical relevance scoring and interpolation-driven regression. Experimental results reveal a significant domain gap for current state-of-the-art models on gRef-CW, while Weed-VG substantially improves localization accuracy, establishing a reliable baseline and a new paradigm for visual grounding in agricultural applications.

Technology Category

Application Category

📝 Abstract
Understanding field imagery such as detecting plants and distinguishing individual crop and weed instances is a central challenge in precision agriculture. Despite progress in vision-language tasks like captioning and visual question answering, Visual Grounding (VG), localising language-referred objects, remains unexplored in agriculture. A key reason is the lack of suitable benchmark datasets for evaluating grounding models in field conditions, where many plants look highly similar, appear at multiple scales, and the referred target may be absent from the image. To address these limitations, we introduce gRef-CW, the first dataset designed for generalised visual grounding in agriculture, including negative expressions. Benchmarking current state-of-the-art grounding models on gRef-CW reveals a substantial domain gap, highlighting their inability to ground instances of crops and weeds. Motivated by these findings, we introduce Weed-VG, a modular framework that incorporates multi-label hierarchical relevance scoring and interpolation-driven regression. Weed-VG advances instance-level visual grounding and provides a clear baseline for developing VG methods in precision agriculture. Code will be released upon acceptance.
Problem

Research questions and friction points this paper is trying to address.

Visual Grounding
Precision Agriculture
Instance-level Grounding
Crop-Weed Differentiation
Benchmark Dataset
Innovation

Methods, ideas, or system contributions that make the work stand out.

visual grounding
multi-label instance-level
agricultural benchmark dataset
hierarchical relevance scoring
interpolation-driven regression
🔎 Similar Papers
No similar papers found.
M
Mohammadreza Haghighat
College of Science and Engineering, James Cook University, Townsville, QLD, Australia; Centre for AI and Data Science Innovation, James Cook University, Townsville, QLD, Australia
A
Alzayat Saleh
College of Science and Engineering, James Cook University, Townsville, QLD, Australia; Centre for AI and Data Science Innovation, James Cook University, Townsville, QLD, Australia
Mostafa Rahimi Azghadi
Mostafa Rahimi Azghadi
Professor, Electronics & Computer Engineering, James Cook University
Neuromorphic ComputingDeep LearningMachine Learning