UniChange: Unifying Change Detection with Multimodal Large Language Model

📅 2025-11-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the fragmentation between binary change detection (BCD) and semantic change detection (SCD), weak generalization, and poor cross-dataset transferability in remote sensing change detection, this paper pioneers the integration of multimodal large language models (MLLMs) into the task. We propose a unified modeling paradigm grounded in text prompting and three dedicated special tokens—[T1], [T2], and [CHANGE]—enabling joint BCD and SCD modeling with implicit knowledge fusion while eliminating reliance on fixed classification heads. Our method synergistically combines generative language understanding capabilities with a dedicated change-aware module, supporting flexible task guidance and zero-shot adaptation. Evaluated on four major benchmarks—WHU-CD, S2Looking, LEVIR-CD+, and SECOND—our approach achieves state-of-the-art performance with IoU scores of 90.41, 53.04, 78.87, and 57.62, respectively, significantly outperforming existing methods.

Technology Category

Application Category

📝 Abstract
Change detection (CD) is a fundamental task for monitoring and analyzing land cover dynamics. While recent high performance models and high quality datasets have significantly advanced the field, a critical limitation persists. Current models typically acquire limited knowledge from single-type annotated data and cannot concurrently leverage diverse binary change detection (BCD) and semantic change detection (SCD) datasets. This constraint leads to poor generalization and limited versatility. The recent advancements in Multimodal Large Language Models (MLLMs) introduce new possibilities for a unified CD framework. We leverage the language priors and unification capabilities of MLLMs to develop UniChange, the first MLLM-based unified change detection model. UniChange integrates generative language abilities with specialized CD functionalities. Our model successfully unifies both BCD and SCD tasks through the introduction of three special tokens: [T1], [T2], and [CHANGE]. Furthermore, UniChange utilizes text prompts to guide the identification of change categories, eliminating the reliance on predefined classification heads. This design allows UniChange to effectively acquire knowledge from multi-source datasets, even when their class definitions conflict. Experiments on four public benchmarks (WHU-CD, S2Looking, LEVIR-CD+, and SECOND) demonstrate SOTA performance, achieving IoU scores of 90.41, 53.04, 78.87, and 57.62, respectively, surpassing all previous methods. The code is available at https://github.com/Erxucomeon/UniChange.
Problem

Research questions and friction points this paper is trying to address.

Unifying binary and semantic change detection tasks
Overcoming limited generalization from single-type datasets
Leveraging multimodal language models for versatile change analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses multimodal large language model for unified change detection
Introduces special tokens [T1] [T2] [CHANGE] for task unification
Employs text prompts instead of predefined classification heads
🔎 Similar Papers
No similar papers found.
X
Xu Zhang
TMCC, Computer Science, Nankai University
Danyang Li
Danyang Li
Shuimu Scholar, Tsinghua University
Embodied AIMobile ComputingInternet of ThingsEdge ComputingSLAM System
Xiaohang Dong
Xiaohang Dong
Nankai University
T
Tianhao Wu
CMEE, Sichuan Agricultural University
Hualong Yu
Hualong Yu
Jiangsu University of Science and Technology
Machine learningBioinformatics
J
Jianye Wang
TMCC, Computer Science, Nankai University
Q
Qicheng Li
TMCC, Computer Science, Nankai University
X
Xiang Li
VCIP, Computer Science, Nankai University; NKIARI, Futian, Shenzhen