HOMOE: A Memory-Based and Composition-Aware Framework for Zero-Shot Learning with Hopfield Network and Soft Mixture of Experts

๐Ÿ“… 2023-11-23
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 1
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the challenge of poor generalization to unseen attributeโ€“object compositions in compositional zero-shot learning (CZSL). To this end, we propose a novel memory-driven and composition-aware framework. Methodologically, we introduce the first integration of modern Hopfield networks with a soft mixture-of-experts (MoE) architecture to construct a retrievable and composable semantic memory module, enabling dynamic prototype generation and matching grounded in hierarchical structures and semantic primitives. We further design an end-to-end differentiable compositional semantic embedding mechanism to support fine-grained compositional reasoning. Our approach achieves state-of-the-art performance on MIT-States and UT-Zappos benchmarks. Ablation studies confirm that each component significantly enhances cross-composition generalization. Overall, the framework provides an interpretable and scalable paradigm for CZSL.
๐Ÿ“ Abstract
Compositional Zero-Shot Learning (CZSL) has emerged as an essential paradigm in machine learning, aiming to overcome the constraints of traditional zero-shot learning by incorporating compositional thinking into its methodology. Conventional zero-shot learning has difficulty managing unfamiliar combinations of seen and unseen classes because it depends on pre-defined class embeddings. In contrast, Compositional Zero-Shot Learning uses the inherent hierarchies and structural connections among classes, creating new class representations by combining attributes, components, or other semantic elements. In our paper, we propose a novel framework that for the first time combines the Modern Hopfield Network with a Mixture of Experts (HOMOE) to classify the compositions of previously unseen objects. Specifically, the Modern Hopfield Network creates a memory that stores label prototypes and identifies relevant labels for a given input image. Following this, the Mixture of Expert models integrates the image with the fitting prototype to produce the final composition classification. Our approach achieves SOTA performance on several benchmarks, including MIT-States and UT-Zappos. We also examine how each component contributes to improved generalization.
Problem

Research questions and friction points this paper is trying to address.

Overcoming limitations of traditional zero-shot learning
Classifying unseen object compositions using Hopfield Network
Achieving state-of-the-art performance in CZSL benchmarks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines Hopfield Network with Mixture of Experts
Uses memory-based label prototypes for classification
Integrates image with prototypes for final classification
๐Ÿ”Ž Similar Papers
No similar papers found.
Do Huu Dat
Do Huu Dat
VinUniversity
Machine Learning
P
Po Yuan Mao
Kyushu University
T
Tien Hoang Nguyen
VNU University of Engineering and Technology
W
W. Buntine
VinUniversity
Mohammed Bennamoun
Mohammed Bennamoun
Winthrop Professor - University of Western Australia
Artificial IntelligenceComputer VisionDeep LearningFace RecognitionBiometrics