MAgSeg: Segmentation of Agricultural Landscapes in High-Resolution Satellite Imagery using Multimodal Large Language Models

📅 2026-05-15

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

This study addresses the challenges of segmenting smallholder agricultural landscapes in the Global South from high-resolution remote sensing imagery, where fragmented field boundaries, high intra-class variability, and scarce annotations hinder performance. To overcome these issues, the authors propose a novel multimodal large language model (MLLM) segmentation approach that operates without a visual decoder. By introducing a new instruction-tuning data format, the model leverages global contextual information to generate textual tokens for local image patches, effectively circumventing limitations imposed by context length and domain misalignment. Experiments on datasets from three Global South countries demonstrate that the proposed method significantly outperforms existing MLLM baselines, offering an accurate and scalable solution for mapping smallholder farmland.

📝 Abstract

Agricultural landscape segmentation in the Global South is challenging as it is characterized by fragmented plots, high intra-class variance, and a scarcity of labeled training data. Recent advances in segmentation have been made by Multimodal Large Language Models (MLLMs). However, current approaches encounter critical context length bottlenecks and a domain alignment gap in understanding satellite features. We address these limitations through MAgSeg, a novel, decoder-free MLLM segmentation approach. MAgSeg is an architecturally efficient approach that enables standard MLLMs to perform segmentation of complex smallholder agricultural landscapes from high-resolution satellite imagery, without requiring auxiliary vision decoders. We introduce a novel instruction tuning data format designed to enable scalable fine-tuning and post-training on high resolution satellite imagery, which enables MAgSeg to learn from the global context of the image while generating text tokens for only a patch within the image. Extensive evaluations on datasets spanning three countries in the Global South demonstrate that MAgSeg significantly outperforms state-of-the-art MLLM baselines, offering a scalable solution to map smallholder agricultural environments.

Problem

Research questions and friction points this paper is trying to address.

agricultural landscape segmentation

Global South

high-resolution satellite imagery

labeled data scarcity

domain alignment gap

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal Large Language Models

Agricultural Segmentation

Decoder-Free Architecture