Discriminative Image Generation with Diffusion Models for Zero-Shot Learning

📅 2024-12-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In zero-shot learning (ZSL), generative approaches often rely on manually annotated semantic prototypes, suffering from poor interpretability and limited scalability. To address this, we propose DIG-ZSL, a discriminative image generation framework that introduces Discriminative Class Tokens (DCTs) into diffusion models for the first time. DIG-ZSL leverages a pre-trained vision-language discriminative model (CDM) to self-supervise the optimization of DCTs—eliminating the need for human-annotated prototypes—enabling synthesis of highly discriminative and diverse images. The method jointly enhances generation quality and semantic interpretability while supporting general-purpose ZSL. Evaluated on four standard benchmarks, DIG-ZSL significantly outperforms state-of-the-art unsupervised ZSL methods and matches or surpasses leading prototype-dependent approaches in classification accuracy. Generated images exhibit high fidelity and strong inter-class separability.

Technology Category

Application Category

📝 Abstract
Generative Zero-Shot Learning (ZSL) methods synthesize class-related features based on predefined class semantic prototypes, showcasing superior performance. However, this feature generation paradigm falls short of providing interpretable insights. In addition, existing approaches rely on semantic prototypes annotated by human experts, which exhibit a significant limitation in their scalability to generalized scenes. To overcome these deficiencies, a natural solution is to generate images for unseen classes using text prompts. To this end, We present DIG-ZSL, a novel Discriminative Image Generation framework for Zero-Shot Learning. Specifically, to ensure the generation of discriminative images for training an effective ZSL classifier, we learn a discriminative class token (DCT) for each unseen class under the guidance of a pre-trained category discrimination model (CDM). Harnessing DCTs, we can generate diverse and high-quality images, which serve as informative unseen samples for ZSL tasks. In this paper, the extensive experiments and visualizations on four datasets show that our DIG-ZSL: (1) generates diverse and high-quality images, (2) outperforms previous state-of-the-art nonhuman-annotated semantic prototype-based methods by a large margin, and (3) achieves comparable or better performance than baselines that leverage human-annotated semantic prototypes. The codes will be made available upon acceptance of the paper.
Problem

Research questions and friction points this paper is trying to address.

Zero-shot Learning
Generative Methods
Explainability and Annotation Challenges
Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion Model
Generative Zero-Shot Learning
Automatic Annotation