Now you see it, Now you don't: Damage Label Agreement in Drone&Satellite Post-Disaster Imagery

📅 2025-05-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses a critical reliability challenge in post-disaster AI assessment: inconsistent building damage annotations between drone and satellite imagery. It presents the first systematic audit of label consistency across multiple hurricane events. Leveraging a unified annotation schema and high-precision building localization—using a dataset 19.05× larger than prior studies—the study employs cross-platform image alignment, chi-square and Kolmogorov–Smirnov tests, and multi-source consistency analysis. Results reveal a 29.02% annotation inconsistency rate, with satellite imagery underreporting damage by 20.43% on average relative to drone imagery (p < 1.2 × 10⁻¹¹⁷); the damage distributions between platforms are statistically distinct (p < 5.1 × 10⁻¹⁷⁵). The work establishes a quality auditing framework for disaster remote sensing data and proposes four evidence-based practices to enhance transparency and decision reliability of computer vision and machine learning systems—providing both methodological foundations and empirical validation for trustworthy remote sensing–based damage assessment.

Technology Category

Application Category

📝 Abstract
This paper audits damage labels derived from coincident satellite and drone aerial imagery for 15,814 buildings across Hurricanes Ian, Michael, and Harvey, finding 29.02% label disagreement and significantly different distributions between the two sources, which presents risks and potential harms during the deployment of machine learning damage assessment systems. Currently, there is no known study of label agreement between drone and satellite imagery for building damage assessment. The only prior work that could be used to infer if such imagery-derived labels agree is limited by differing damage label schemas, misaligned building locations, and low data quantities. This work overcomes these limitations by comparing damage labels using the same damage label schemas and building locations from three hurricanes, with the 15,814 buildings representing 19.05 times more buildings considered than the most relevant prior work. The analysis finds satellite-derived labels significantly under-report damage by at least 20.43% compared to drone-derived labels (p<1.2x10^-117), and satellite- and drone-derived labels represent significantly different distributions (p<5.1x10^-175). This indicates that computer vision and machine learning (CV/ML) models trained on at least one of these distributions will misrepresent actual conditions, as the differing satellite and drone-derived distributions cannot simultaneously represent the distribution of actual conditions in a scene. This potential misrepresentation poses ethical risks and potential societal harm if not managed. To reduce the risk of future societal harms, this paper offers four recommendations to improve reliability and transparency to decisio-makers when deploying CV/ML damage assessment systems in practice
Problem

Research questions and friction points this paper is trying to address.

Assessing label disagreement in drone and satellite disaster imagery
Evaluating risks in machine learning damage assessment systems
Addressing ethical risks from misrepresented damage distributions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Compares drone and satellite damage labels
Uses same damage schemas and locations
Analyzes 15,814 buildings across hurricanes
🔎 Similar Papers
No similar papers found.