HMR-Net: Hierarchical Modular Routing for Cross-Domain Object Detection in Aerial Images

📅 2026-04-20

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

This work addresses the limited generalization capability of object detection in aerial imagery caused by discrepancies in spatial resolution, scene composition, and semantic labeling across domains. To tackle this challenge, the authors propose a hierarchical modular routing framework that integrates geospatial-aware global routing with category-semantic-conditioned expert modules. By jointly leveraging global expert allocation and local scene decomposition, the method enables dual specialization—across datasets and within individual scenes—and supports zero-shot detection of novel categories without fine-tuning. Extensive experiments on four aerial image datasets demonstrate substantial improvements in multi-domain generalization, region-specific detection accuracy, and open-category recognition performance.

Technology Category

Application Category

📝 Abstract

Despite advances in object detection, aerial imagery remains a challenging domain, as models often fail to generalize across variations in spatial resolution, scene composition, and semantic label coverage. Differences in geographic context, sensor characteristics, and object distributions across datasets limit the capacity of conventional models to learn consistent and transferable representations. Shared methods trained on such data tend to impose a unified representation across fundamentally different domains, resulting in poor performance on region-specific content and less flexibility when dealing with novel object categories. To address this, we propose a novel modular learning framework that enables structured specialization in aerial detection. Our method introduces a hierarchical routing mechanism with two levels of modularity: a global expert assignment layer that uses latent geographic embeddings to route datasets to specialized processing modules, and a local scene decomposition mechanism that allocates image subregions to region-specific sub-modules. This allows our method to specialize across datasets and within complex scenes. Additionally, the framework contains a conditional expert module that uses external semantic information (e.g., category names or textual descriptions) to enable detection of novel object categories during inference, without the need for retraining or fine-tuning. By moving beyond monolithic representations, our method offers an adaptive framework for remote sensing object detection. Comprehensive evaluations on four datasets highlight improvements in multi-dataset generalization, regional specialization, and open-category detection.

Problem

Research questions and friction points this paper is trying to address.

cross-domain object detection

aerial images

domain generalization

novel category detection

remote sensing

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical Modular Routing

Cross-Domain Object Detection

Geographic Embedding