Vision-Language Agents for Interactive Forest Change Analysis

📅 2026-01-08
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of integrating pixel-level detection with semantic interpretation in forest change analysis by proposing the first large language model (LLM)-driven interactive vision-language agent capable of multi-task change interpretation through natural language queries. The method innovatively combines vision-language models with LLMs to construct a Multi-level Change Interpretation (MCI) framework and introduces Forest-Change, the first forest change dataset annotated with semantic labels. Experimental results demonstrate that the proposed approach achieves 67.10% mIoU and 40.17% BLEU-4 on Forest-Change, and 88.13% mIoU and 34.41% BLEU-4 on the LEVIR-MCI-Trees subset, significantly enhancing the accuracy, interpretability, and interactive efficiency of remote sensing-based change analysis.

Technology Category

Application Category

📝 Abstract
Modern forest monitoring workflows increasingly benefit from the growing availability of high-resolution satellite imagery and advances in deep learning. Two persistent challenges in this context are accurate pixel-level change detection and meaningful semantic change captioning for complex forest dynamics. While large language models (LLMs) are being adapted for interactive data exploration, their integration with vision-language models (VLMs) for remote sensing image change interpretation (RSICI) remains underexplored. To address this gap, we introduce an LLM-driven agent for integrated forest change analysis that supports natural language querying across multiple RSICI tasks. The proposed system builds upon a multi-level change interpretation (MCI) vision-language backbone with LLM-based orchestration. To facilitate adaptation and evaluation in forest environments, we further introduce the Forest-Change dataset, which comprises bi-temporal satellite imagery, pixel-level change masks, and multi-granularity semantic change captions generated using a combination of human annotation and rule-based methods. Experimental results show that the proposed system achieves mIoU and BLEU-4 scores of 67.10% and 40.17% on the Forest-Change dataset, and 88.13% and 34.41% on LEVIR-MCI-Trees, a tree-focused subset of LEVIR-MCI benchmark for joint change detection and captioning. These results highlight the potential of interactive, LLM-driven RSICI systems to improve accessibility, interpretability, and efficiency of forest change analysis. All data and code are publicly available at https://github.com/JamesBrockUoB/ForestChat.
Problem

Research questions and friction points this paper is trying to address.

forest change analysis
pixel-level change detection
semantic change captioning
vision-language models
remote sensing image change interpretation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision-Language Agent
LLM-driven RSICI
Multi-level Change Interpretation
Forest-Change Dataset
Interactive Forest Monitoring
🔎 Similar Papers
No similar papers found.
J
James Brock
University of Bristol, Beacon House, Queens Road, Bristol, United Kingdom
Ce Zhang
Ce Zhang
Lecturer, School of Geographical Sciences, University of Bristol, UK
Machine LearningDeep LearningGeospatial Data ScienceRemote Sensing
N
N. Anantrasirichai
University of Bristol, Beacon House, Queens Road, Bristol, United Kingdom