UAV-OVO: Out-of-Viewpoint Generalization in UAV Action Recognition

📅 2026-05-25

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

This work addresses the degradation in generalization performance of action recognition models for unmanned aerial vehicles (UAVs) caused by viewpoint shifts—particularly from low to high俯 angles. To this end, it introduces UAV-OVO, the first cross-view generalization benchmark for UAV action recognition, which leverages uncalibrated videos to estimate viewpoint scores and partitions test sets into in-distribution (low俯 angle) and out-of-distribution (high俯 angle) subsets. The proposed method, LATER, integrates LoRA fine-tuning, orthogonal complement space projection, and feature re-centering at test time, treating the LoRA subspace as a semantic anchor to suppress viewpoint-related shortcut features. Experiments demonstrate that LATER substantially narrows the performance gap between in- and out-of-distribution settings and consistently improves action recognition accuracy under high俯 angles across multiple video architectures.

📝 Abstract

UAV action recognition faces a deployment shift that standard benchmarks often obscure: a model trained on UAV footage captured from low-depression viewpoints may be required to recognize the same action classes from high-depression viewpoints. While the action labels remain unchanged, this shift alters body visibility, motion projection, and scene context, encouraging models to rely on viewpoint-specific shortcuts. We introduce UAV-OVO, an Out-of-Viewpoint generalization benchmark for UAV action recognition. UAV-OVO derives view scores from uncalibrated videos, uses a view-isolation band to assign low-depression videos to the training and in-distribution test splits while reserving high-depression videos for out-of-distribution testing, and constructs ID/OOD test sets matched by class distribution so that performance differences reflect viewpoint shift rather than label imbalance. Across representative video recognizers, UAV-OVO reveals a substantial ID/OOD gap: models that fit the low-depression training distribution well often fail to transfer to held-out high-depression views, exposing viewpoint shortcuts hidden by aggregate accuracy. We further propose LATER, LoRA-Anchored Test-time Re-centering, which first adapts the recognizer with Low-Rank Adaptation (LoRA) and then uses the learned LoRA subspace as a semantic anchor for online feature re-centering. Specifically, LATER projects target-domain displacement onto the orthogonal complement of the LoRA subspace before re-centering features, reducing viewpoint-induced drift while preserving task-relevant semantics. Together, UAV-OVO and LATER provide a controlled testbed and a practical adaptation method for viewpoint-robust UAV video understanding.

Problem

Research questions and friction points this paper is trying to address.

UAV action recognition

out-of-viewpoint generalization

viewpoint shift

domain generalization

action recognition

Innovation

Methods, ideas, or system contributions that make the work stand out.

Out-of-Viewpoint Generalization

UAV Action Recognition

LoRA