🤖 AI Summary
Detecting subtle infrastructure damage—such as micro-cracks and corrosion—in complex environments remains challenging due to poor robustness and low sensitivity of conventional methods. To address this, we propose a miniature UAV-based multimodal visual perception system. It is the first to jointly exploit RGB, thermal infrared, and polarization imaging modalities, integrated via a lightweight cross-modal feature alignment network for fine-grained, joint damage modeling. Coupled with an edge-deployable real-time inference framework and a few-shot damage segmentation algorithm, the system enables synchronous acquisition of multi-dimensional diagnostic cues in a single flight. Evaluated on real-world bridge scenarios, it achieves 92.3% damage classification accuracy, sub-5 cm localization error, and <80 ms per-frame latency. Its F1-score improves by 17.6% over unimodal baselines, significantly enhancing detection stability and practicality under adverse conditions.