MAC: A Benchmark for Multiple Attributes Compositional Zero-Shot Learning

📅 2024-06-18

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

Existing CZSL benchmarks rely solely on single-attribute annotations, ignoring the inherent semantic co-occurrence and interdependence among multiple attributes—leading to annotation bias and flawed evaluation. To address this, we introduce MAC, the first zero-shot learning benchmark supporting multi-attribute composition: it comprises 18,217 images annotated with 11,067 fine-grained attribute combinations, averaging 30.2 attributes per object. MAC is the first to systematically model higher-order semantic synergies among attributes. We propose the MM-encoder, which disentangles attribute and object representations and incorporates graph-structured modeling to capture complex, high-order attribute correlations. Evaluated on MAC, our approach achieves substantial gains in multi-attribute composition recognition accuracy. This work shifts CZSL evaluation from oversimplified single-attribute assumptions toward realistic, compositional scenarios and establishes a new standard benchmark for rigorous assessment.

Technology Category

Application Category

📝 Abstract

Compositional Zero-Shot Learning (CZSL) aims to learn semantic primitives (attributes and objects) from seen compositions and recognize unseen attribute-object compositions. Existing CZSL datasets focus on single attributes, neglecting the fact that objects naturally exhibit multiple interrelated attributes. Real-world objects often possess multiple interrelated attributes, and current datasets' narrow attribute scope and single attribute labeling introduce annotation biases, undermining model performance and evaluation. To address these limitations, we introduce the Multi-Attribute Composition (MAC) dataset, encompassing 18,217 images and 11,067 compositions with comprehensive, representative, and diverse attribute annotations. MAC includes an average of 30.2 attributes per object and 65.4 objects per attribute, facilitating better multi-attribute composition predictions. Our dataset supports deeper semantic understanding and higher-order attribute associations, providing a more realistic and challenging benchmark for the CZSL task. We also develop solutions for multi-attribute compositional learning and propose the MM-encoder to disentangling the attributes and objects.

Problem

Research questions and friction points this paper is trying to address.

Addresses limitations in existing CZSL datasets focusing on single attributes.

Introduces MAC dataset with multiple interrelated attributes for realistic CZSL evaluation.

Proposes MVP-Integrator for improved multi-attribute CZSL performance and efficiency.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces Multi-Attribute Composition (MAC) dataset

Proposes Multi-attribute Visual-Primitive Integrator (MVP-Integrator)

Enhances semantic understanding in zero-shot learning

🔎 Similar Papers

Graph-guided Cross-composition Feature Disentanglement for Compositional Zero-shot Learning