Visual-Language-Guided Task Planning for Horticultural Robots

📅 2026-01-17

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

This work addresses the limited high-level reasoning capabilities of existing agricultural robots, which hinder their effectiveness in complex crop monitoring tasks. The authors propose a modular task planning framework that leverages a vision-language model (VLM) to guide horticultural robots through interleaved visual–language queries and action primitives for intelligent decision-making. They establish the first benchmark for short- and long-term crop monitoring in both monoculture and polyculture environments and uncover a critical limitation: VLM performance degrades significantly in long-horizon tasks due to noise in semantic maps. Experimental results show that the approach achieves near-human performance in short-term scenarios but suffers notable degradation in long-term settings reliant on noisy semantic maps, thereby highlighting the necessity of robust semantic mapping for VLM-driven agricultural automation.

Technology Category

Application Category

📝 Abstract

Crop monitoring is essential for precision agriculture, but current systems lack high-level reasoning. We introduce a novel, modular framework that uses a Visual Language Model (VLM) to guide robotic task planning, interleaving input queries with action primitives. We contribute a comprehensive benchmark for short- and long-horizon crop monitoring tasks in monoculture and polyculture environments. Our main results show that VLMs perform robustly for short-horizon tasks (comparable to human success), but exhibit significant performance degradation in challenging long-horizon tasks. Critically, the system fails when relying on noisy semantic maps, demonstrating a key limitation in current VLM context grounding for sustained robotic operations. This work offers a deployable framework and critical insights into VLM capabilities and shortcomings for complex agricultural robotics.

Problem

Research questions and friction points this paper is trying to address.

visual-language model

task planning

horticultural robots

crop monitoring

long-horizon tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Visual Language Model

Task Planning

Agricultural Robotics