🤖 AI Summary
This study addresses the challenges of automated building damage assessment following tornado disasters, where severe domain shift and extreme class imbalance undermine the accuracy of existing methods. To this end, the authors introduce the Quad-State Tornado Damage benchmark dataset and conduct a systematic evaluation of 79 open-source deep learning models through over 2,300 controlled experiments, analyzing the joint effects of architecture, optimizer, and learning rate. The findings reveal that optimizer choice is more decisive than model architecture: SGD substantially enhances Vision Transformer performance, and a low learning rate (1e-4) consistently yields improvements. The best-performing ConvNeXt-Base model achieves a Macro F1 score of 46.4% (+34.6 points) and an Ordinal Top-1 accuracy of 85.5% on the cross-event TMTD dataset, significantly outperforming baseline approaches.
📝 Abstract
Rapid and accurate building damage assessment in the immediate aftermath of tornadoes is critical for coordinating life-saving search and rescue operations, optimizing emergency resource allocation, and accelerating community recovery. However, current automated methods struggle with the unique visual complexity of tornado-induced wreckage, primarily due to severe domain shift from standard pre-training datasets and extreme class imbalance in real-world disaster data. To address these challenges, we introduce a systematic experimental framework evaluating 79 open-source deep learning models, encompassing both Convolutional Neural Networks (CNNs) and Vision Transformers, across over 2,300 controlled experiments on our newly curated Quad-State Tornado Damage (QSTD) benchmark dataset. Our findings reveal that achieving operational-grade performance hinges on a complex interaction between architecture and optimization, rather than architectural selection alone. Most strikingly, we demonstrate that optimizer choice can be more consequential than architecture: switching from Adam to SGD provided dramatic F1 gains of +25 to +38 points for Vision Transformer and Swin Transformer families, fundamentally reversing their ranking from bottom-tier to competitive with top-performing CNNs. Furthermore, a low learning rate of 1x10^(-4) proved universally critical, boosting average F1 performance by +10.2 points across all architectures. Our champion model, ConvNeXt-Base trained with these optimized settings, demonstrated strong cross-event generalization on the held-out Tuscaloosa-Moore Tornado Damage (TMTD) dataset, achieving 46.4% Macro F1 (+34.6 points over baseline) and retaining 85.5% Ordinal Top-1 Accuracy despite temporal and sensor domain shifts.