RSCC: A Large-Scale Remote Sensing Change Caption Dataset for Disaster Events

📅 2025-09-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing remote sensing datasets suffer from a lack of bitemporal image pairs and fine-grained textual annotations, hindering semantic understanding of dynamic disaster impacts. To address this, we introduce the first large-scale pre-disaster–post-disaster remote sensing image–text paired dataset, comprising 62,315 high-quality pairs covering multiple disaster types, each annotated with human-level natural language change descriptions. We propose a “vision-language joint change modeling” paradigm, leveraging human-in-the-loop curation and multi-stage automated pipelines to ensure annotation accuracy and semantic richness. This dataset is the first to systematically support bitemporal vision-language understanding, significantly enhancing model interpretability and fine-grained change description capabilities. Extensive evaluation on multiple vision-language pretraining (VLP) benchmarks confirms its effectiveness for both training and assessment. Our work advances intelligent remote sensing interpretation toward greater readability, granularity, and semantic fidelity.

Technology Category

Application Category

📝 Abstract
Remote sensing is critical for disaster monitoring, yet existing datasets lack temporal image pairs and detailed textual annotations. While single-snapshot imagery dominates current resources, it fails to capture dynamic disaster impacts over time. To address this gap, we introduce the Remote Sensing Change Caption (RSCC) dataset, a large-scale benchmark comprising 62,315 pre-/post-disaster image pairs (spanning earthquakes, floods, wildfires, and more) paired with rich, human-like change captions. By bridging the temporal and semantic divide in remote sensing data, RSCC enables robust training and evaluation of vision-language models for disaster-aware bi-temporal understanding. Our results highlight RSCC's ability to facilitate detailed disaster-related analysis, paving the way for more accurate, interpretable, and scalable vision-language applications in remote sensing. Code and dataset are available at https://github.com/Bili-Sakura/RSCC.
Problem

Research questions and friction points this paper is trying to address.

Lacks temporal image pairs for disaster monitoring
Absence of detailed textual annotations in datasets
Need for dynamic disaster impact analysis over time
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale pre-/post-disaster image pairs dataset
Human-like change captions for temporal analysis
Vision-language models for bi-temporal disaster understanding
🔎 Similar Papers
No similar papers found.