🤖 AI Summary
Zero-shot learning (ZSL) suffers from insufficient modeling of local region feature interactions and weak discriminability in semantic–visual mapping. To address these issues, we propose the Multi-Granularity Interactive Refinement Network (MGIR-Net), the first approach to explicitly model bidirectional interactions between region features across granularities—e.g., part-level and object-level—via multi-granularity feature disentanglement and hierarchical fusion, thereby enhancing feature transferability and semantic–visual alignment. Our method comprises two core components: a region-aware multi-granularity feature extraction module and a cross-granularity interactive refinement module, enabling fine-grained semantic-guided adaptive feature optimization. Extensive experiments demonstrate significant improvements over state-of-the-art methods on three standard ZSL benchmarks—CUB, SUN, and aWNI—under both generalized and conventional ZSL settings. The source code is publicly available.
📝 Abstract
Zero-shot learning (ZSL) aims to recognize unseen classes with zero samples by transferring semantic knowledge from seen classes. Current approaches typically correlate global visual features with semantic information (i.e., attributes) or align local visual region features with corresponding attributes to enhance visual-semantic interactions. Although effective, these methods often overlook the intrinsic interactions between local region features, which can further improve the acquisition of transferable and explicit visual features. In this paper, we propose a network named Multi-Granularity Mutual Refinement Network (Mg-MRN), which refine discriminative and transferable visual features by learning decoupled multi-granularity features and cross-granularity feature interactions. Specifically, we design a multi-granularity feature extraction module to learn region-level discriminative features through decoupled region feature mining. Then, a cross-granularity feature fusion module strengthens the inherent interactions between region features of varying granularities. This module enhances the discriminability of representations at each granularity level by integrating region representations from adjacent hierarchies, further improving ZSL recognition performance. Extensive experiments on three popular ZSL benchmark datasets demonstrate the superiority and competitiveness of our proposed Mg-MRN method. Our code is available at https://github.com/NingWang2049/Mg-MRN.