A2Seek: Towards Reasoning-Centric Benchmark for Aerial Anomaly Understanding

📅 2025-05-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing anomaly detection methods for unmanned aerial vehicle (UAV) imagery suffer from limited generalization due to reliance on static ground-level datasets and models, failing to address dynamic viewpoints, scale variations, and scene complexity inherent in aerial observation. To bridge this gap, we introduce A2Seek—the first reasoning-centric benchmark for aerial anomaly understanding—featuring multi-scene, high-resolution aerial videos with fine-grained annotations encompassing category, temporal boundaries, spatial localization, and causal linguistic explanations. Complementing the benchmark, we propose A2Seek-R1, a novel framework integrating schema-guided Graph-of-Thought (GoT) fine-tuning, airspace-customized reinforcement learning via A-GRPO, and a UAV-inspired active “searching” dynamic attention mechanism. Extensive experiments demonstrate that A2Seek-R1 achieves +22.04% AP and +13.9% mIoU over baselines, significantly improving robustness to complex environments and out-of-distribution aerial scenes.

Technology Category

Application Category

📝 Abstract
While unmanned aerial vehicles (UAVs) offer wide-area, high-altitude coverage for anomaly detection, they face challenges such as dynamic viewpoints, scale variations, and complex scenes. Existing datasets and methods, mainly designed for fixed ground-level views, struggle to adapt to these conditions, leading to significant performance drops in drone-view scenarios. To bridge this gap, we introduce A2Seek (Aerial Anomaly Seek), a large-scale, reasoning-centric benchmark dataset for aerial anomaly understanding. This dataset covers various scenarios and environmental conditions, providing high-resolution real-world aerial videos with detailed annotations, including anomaly categories, frame-level timestamps, region-level bounding boxes, and natural language explanations for causal reasoning. Building on this dataset, we propose A2Seek-R1, a novel reasoning framework that generalizes R1-style strategies to aerial anomaly understanding, enabling a deeper understanding of"Where"anomalies occur and"Why"they happen in aerial frames. To this end, A2Seek-R1 first employs a graph-of-thought (GoT)-guided supervised fine-tuning approach to activate the model's latent reasoning capabilities on A2Seek. Then, we introduce Aerial Group Relative Policy Optimization (A-GRPO) to design rule-based reward functions tailored to aerial scenarios. Furthermore, we propose a novel"seeking"mechanism that simulates UAV flight behavior by directing the model's attention to informative regions. Extensive experiments demonstrate that A2Seek-R1 achieves up to a 22.04% improvement in AP for prediction accuracy and a 13.9% gain in mIoU for anomaly localization, exhibiting strong generalization across complex environments and out-of-distribution scenarios. Our dataset and code will be released at https://hayneyday.github.io/A2Seek/.
Problem

Research questions and friction points this paper is trying to address.

Addressing UAV anomaly detection challenges like dynamic viewpoints and scale variations
Overcoming limitations of ground-level datasets in drone-view scenarios
Enhancing aerial anomaly understanding with reasoning-centric benchmarks and frameworks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Graph-of-thought-guided supervised fine-tuning approach
Aerial Group Relative Policy Optimization (A-GRPO)
Novel seeking mechanism simulating UAV flight behavior
🔎 Similar Papers
No similar papers found.
M
Mengjingcheng Mo
Chongqing University of Posts and Telecommunications, Chongqing, China; Chongqing Institute for Brain and Intelligence, Guangyang Bay Laboratory, Chongqing, China
X
Xinyang Tong
Chongqing University of Posts and Telecommunications, Chongqing, China; Chongqing Institute for Brain and Intelligence, Guangyang Bay Laboratory, Chongqing, China
Jiaxu Leng
Jiaxu Leng
Chongqing University of Posts and Telecommunications
Computer Vision
M
Mingpi Tan
Chongqing University of Posts and Telecommunications, Chongqing, China; Chongqing Institute for Brain and Intelligence, Guangyang Bay Laboratory, Chongqing, China
J
Jiankang Zheng
Chongqing University of Posts and Telecommunications, Chongqing, China; Chongqing Institute for Brain and Intelligence, Guangyang Bay Laboratory, Chongqing, China
Y
Yiran Liu
Chongqing University of Posts and Telecommunications, Chongqing, China; Chongqing Institute for Brain and Intelligence, Guangyang Bay Laboratory, Chongqing, China
Haosheng Chen
Haosheng Chen
Chongqing University of Posts and Telecommunications; Xiamen University
Computer Vision
Ji Gan
Ji Gan
Chongqing University of Posts and Telecommunications
Handwriting recognition and generation
Weisheng Li
Weisheng Li
Chongqing University of Posts and Telecommunications
图像处理、模式识别、机器学习、大数据、智能计算
X
Xinbo Gao
Chongqing University of Posts and Telecommunications, Chongqing, China; Chongqing Institute for Brain and Intelligence, Guangyang Bay Laboratory, Chongqing, China