🤖 AI Summary
This work addresses the challenge of real-time processing of high-resolution oblique video for rapid flood mapping in disaster response, where limited onboard computational resources on drones hinder timely analysis. The authors propose the Temporal Token Reuse (TTR) framework, which, for the first time, models spatiotemporal redundancy in video as reusable image-patch tokens. By introducing a lightweight similarity metric and an adaptive feature reuse mechanism, TTR dynamically bypasses redundant computations in the backbone network for static regions. Evaluated on both a newly curated oblique flood dataset and standard benchmarks, TTR reduces inference latency by 30% on edge devices while maintaining segmentation accuracy with a mean Intersection-over-Union (mIoU) degradation of less than 0.5%.
📝 Abstract
Effective disaster response relies on rapid disaster response, where oblique aerial video is the primary modality for initial scouting due to its ability to maximize spatial coverage and situational awareness in limited flight time. However, the on-board processing of high-resolution oblique streams is severely bottlenecked by the strict Size, Weight, and Power (SWaP) constraints of Unmanned Aerial Vehicles (UAVs). The computational density required to process these wide-field-of-view streams precludes low-latency inference on standard edge hardware. To address this, we propose Temporal Token Reuse (TTR), an adaptive inference framework capable of accelerating video segmentation on embedded devices. TTR exploits the intrinsic spatiotemporal redundancy of aerial video by formulating image patches as tokens; it utilizes a lightweight similarity metric to dynamically identify static regions and propagate their precomputed deep features, thereby bypassing redundant backbone computations. We validate the framework on standard benchmarks and a newly curated Oblique Floodwater Dataset designed for hydrological monitoring. Experimental results on edge-grade hardware demonstrate that TTR achieves a 30% reduction in inference latency with negligible degradation in segmentation accuracy (<0.5% mIoU). These findings confirm that TTR effectively shifts the operational Pareto frontier, enabling high-fidelity, real-time oblique video understanding for time-critical remote sensing missions