Visual-Language-Guided Task Planning for Horticultural Robots

📅 2026-01-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limited high-level reasoning capabilities of existing agricultural robots, which hinder their effectiveness in complex crop monitoring tasks. The authors propose a modular task planning framework that leverages a vision-language model (VLM) to guide horticultural robots through interleaved visual–language queries and action primitives for intelligent decision-making. They establish the first benchmark for short- and long-term crop monitoring in both monoculture and polyculture environments and uncover a critical limitation: VLM performance degrades significantly in long-horizon tasks due to noise in semantic maps. Experimental results show that the approach achieves near-human performance in short-term scenarios but suffers notable degradation in long-term settings reliant on noisy semantic maps, thereby highlighting the necessity of robust semantic mapping for VLM-driven agricultural automation.

Technology Category

Application Category

📝 Abstract
Crop monitoring is essential for precision agriculture, but current systems lack high-level reasoning. We introduce a novel, modular framework that uses a Visual Language Model (VLM) to guide robotic task planning, interleaving input queries with action primitives. We contribute a comprehensive benchmark for short- and long-horizon crop monitoring tasks in monoculture and polyculture environments. Our main results show that VLMs perform robustly for short-horizon tasks (comparable to human success), but exhibit significant performance degradation in challenging long-horizon tasks. Critically, the system fails when relying on noisy semantic maps, demonstrating a key limitation in current VLM context grounding for sustained robotic operations. This work offers a deployable framework and critical insights into VLM capabilities and shortcomings for complex agricultural robotics.
Problem

Research questions and friction points this paper is trying to address.

visual-language model
task planning
horticultural robots
crop monitoring
long-horizon tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Visual Language Model
Task Planning
Agricultural Robotics
Semantic Grounding
Benchmark
🔎 Similar Papers
No similar papers found.
J
José Cuarán
Siebel School of Computing and Data Science, University of Illinois, Urbana-Champaign
K
Kendall Koe
Siebel School of Computing and Data Science, University of Illinois, Urbana-Champaign
A
Aditya Potnis
Siebel School of Computing and Data Science, University of Illinois, Urbana-Champaign
N
N. Uppalapati
National Center for Supercomputing Applications, University of Illinois, Urbana-Champaign
Girish Chowdhary
Girish Chowdhary
Associate Professor
RoboticsAgricultural RoboticsAdaptive ControlMobile Robotics