Forest-Chat: Adapting Vision-Language Agents for Interactive Forest Change Analysis

📅 2026-01-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of pixel-level change detection and semantic interpretation in forest remote sensing imagery by proposing the first large language model (LLM)-driven interactive vision-language agent framework. The approach tightly integrates LLMs with vision-language models to construct a Multi-level Change Interpretation (MCI) architecture, enabling joint reasoning across multiple tasks—including change detection, descriptive caption generation, object counting, and deforestation ratio estimation—through natural language queries and interactive point-based prompts. To support this research, we introduce Forest-Change, the first dataset featuring multi-granularity semantic annotations for forest change analysis. Experimental results demonstrate that the proposed system significantly enhances the interpretability, accessibility, and efficiency of change analysis on both the Forest-Change and LEVIR-MCI-Trees benchmarks.

Technology Category

Application Category

📝 Abstract
The increasing availability of high-resolution satellite imagery, together with advances in deep learning, creates new opportunities for enhancing forest monitoring workflows. Two central challenges in this domain are pixel-level change detection and semantic change interpretation, particularly for complex forest dynamics. While large language models (LLMs) are increasingly adopted for data exploration, their integration with vision-language models (VLMs) for remote sensing image change interpretation (RSICI) remains underexplored, especially beyond urban environments. We introduce Forest-Chat, an LLM-driven agent designed for integrated forest change analysis. The proposed framework enables natural language querying and supports multiple RSICI tasks, including change detection, change captioning, object counting, deforestation percentage estimation, and change reasoning. Forest-Chat builds upon a multi-level change interpretation (MCI) vision-language backbone with LLM-based orchestration, and incorporates zero-shot change detection via a foundation change detection model together with an interactive point-prompt interface to support fine-grained user guidance. To facilitate adaptation and evaluation in forest environments, we introduce the Forest-Change dataset, comprising bi-temporal satellite imagery, pixel-level change masks, and multi-granularity semantic change captions generated through a combination of human annotation and rule-based methods. Experimental results demonstrate that Forest-Chat achieves strong performance on Forest-Change and on LEVIR-MCI-Trees, a tree-focused subset of LEVIR-MCI, for joint change detection and captioning, highlighting the potential of interactive, LLM-driven RSICI systems to improve accessibility, interpretability, and analytical efficiency in forest change analysis.
Problem

Research questions and friction points this paper is trying to address.

forest change analysis
remote sensing image change interpretation
pixel-level change detection
semantic change interpretation
vision-language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

vision-language model
large language model
forest change analysis
interactive remote sensing
zero-shot change detection
🔎 Similar Papers
No similar papers found.
J
James Brock
School of Computer Science, University of Bristol, Merchant Venturers Building, 75 Woodland Road, Bristol, BS8 1UB, Bristol, United Kingdom
Ce Zhang
Ce Zhang
Lecturer, School of Geographical Sciences, University of Bristol, UK
Machine LearningDeep LearningGeospatial Data ScienceRemote Sensing
N
N. Anantrasirichai
School of Computer Science, University of Bristol, Merchant Venturers Building, 75 Woodland Road, Bristol, BS8 1UB, Bristol, United Kingdom