Multi-Granularity Mutual Refinement Network for Zero-Shot Learning

📅 2025-11-11

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

Zero-shot learning (ZSL) suffers from insufficient modeling of local region feature interactions and weak discriminability in semantic–visual mapping. To address these issues, we propose the Multi-Granularity Interactive Refinement Network (MGIR-Net), the first approach to explicitly model bidirectional interactions between region features across granularities—e.g., part-level and object-level—via multi-granularity feature disentanglement and hierarchical fusion, thereby enhancing feature transferability and semantic–visual alignment. Our method comprises two core components: a region-aware multi-granularity feature extraction module and a cross-granularity interactive refinement module, enabling fine-grained semantic-guided adaptive feature optimization. Extensive experiments demonstrate significant improvements over state-of-the-art methods on three standard ZSL benchmarks—CUB, SUN, and aWNI—under both generalized and conventional ZSL settings. The source code is publicly available.

Technology Category

Application Category

📝 Abstract

Zero-shot learning (ZSL) aims to recognize unseen classes with zero samples by transferring semantic knowledge from seen classes. Current approaches typically correlate global visual features with semantic information (i.e., attributes) or align local visual region features with corresponding attributes to enhance visual-semantic interactions. Although effective, these methods often overlook the intrinsic interactions between local region features, which can further improve the acquisition of transferable and explicit visual features. In this paper, we propose a network named Multi-Granularity Mutual Refinement Network (Mg-MRN), which refine discriminative and transferable visual features by learning decoupled multi-granularity features and cross-granularity feature interactions. Specifically, we design a multi-granularity feature extraction module to learn region-level discriminative features through decoupled region feature mining. Then, a cross-granularity feature fusion module strengthens the inherent interactions between region features of varying granularities. This module enhances the discriminability of representations at each granularity level by integrating region representations from adjacent hierarchies, further improving ZSL recognition performance. Extensive experiments on three popular ZSL benchmark datasets demonstrate the superiority and competitiveness of our proposed Mg-MRN method. Our code is available at https://github.com/NingWang2049/Mg-MRN.

Problem

Research questions and friction points this paper is trying to address.

Enhancing visual-semantic interactions for zero-shot learning

Improving transferable visual features through multi-granularity refinement

Addressing overlooked intrinsic interactions between local region features

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-granularity feature extraction for discriminative region learning

Cross-granularity fusion module enhancing region feature interactions

Mutual refinement network improving zero-shot recognition performance

🔎 Similar Papers

No similar papers found.