Vision-Language Agents for Interactive Forest Change Analysis

📅 2026-01-08

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

This study addresses the challenge of integrating pixel-level detection with semantic interpretation in forest change analysis by proposing the first large language model (LLM)-driven interactive vision-language agent capable of multi-task change interpretation through natural language queries. The method innovatively combines vision-language models with LLMs to construct a Multi-level Change Interpretation (MCI) framework and introduces Forest-Change, the first forest change dataset annotated with semantic labels. Experimental results demonstrate that the proposed approach achieves 67.10% mIoU and 40.17% BLEU-4 on Forest-Change, and 88.13% mIoU and 34.41% BLEU-4 on the LEVIR-MCI-Trees subset, significantly enhancing the accuracy, interpretability, and interactive efficiency of remote sensing-based change analysis.

Technology Category

Application Category

📝 Abstract

Modern forest monitoring workflows increasingly benefit from the growing availability of high-resolution satellite imagery and advances in deep learning. Two persistent challenges in this context are accurate pixel-level change detection and meaningful semantic change captioning for complex forest dynamics. While large language models (LLMs) are being adapted for interactive data exploration, their integration with vision-language models (VLMs) for remote sensing image change interpretation (RSICI) remains underexplored. To address this gap, we introduce an LLM-driven agent for integrated forest change analysis that supports natural language querying across multiple RSICI tasks. The proposed system builds upon a multi-level change interpretation (MCI) vision-language backbone with LLM-based orchestration. To facilitate adaptation and evaluation in forest environments, we further introduce the Forest-Change dataset, which comprises bi-temporal satellite imagery, pixel-level change masks, and multi-granularity semantic change captions generated using a combination of human annotation and rule-based methods. Experimental results show that the proposed system achieves mIoU and BLEU-4 scores of 67.10% and 40.17% on the Forest-Change dataset, and 88.13% and 34.41% on LEVIR-MCI-Trees, a tree-focused subset of LEVIR-MCI benchmark for joint change detection and captioning. These results highlight the potential of interactive, LLM-driven RSICI systems to improve accessibility, interpretability, and efficiency of forest change analysis. All data and code are publicly available at https://github.com/JamesBrockUoB/ForestChat.

Problem

Research questions and friction points this paper is trying to address.

forest change analysis

pixel-level change detection

semantic change captioning

vision-language models

remote sensing image change interpretation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision-Language Agent

LLM-driven RSICI

Multi-level Change Interpretation