🤖 AI Summary
Zero-shot learning (ZSL) suffers from two critical challenges: semantic unreliability—stemming from LLM hallucinations that produce non-visual, implausible concepts—and decision opacity. To address these, we propose a dynamic phrase-level visual concept generation framework. Our approach is the first to model class semantics as an infinite, interpretable, and image-groundable set of descriptive phrases. We introduce an entropy-driven “quality” filtering mechanism to suppress hallucinations while preserving concept transferability and discriminability. The framework integrates LLM-based dynamic phrase generation, entropy-weighted scoring, cross-modal alignment training, and visualization-enabled interpretability analysis. Evaluated on three standard ZSL benchmarks, our method achieves significant accuracy improvements over state-of-the-art methods. Crucially, it generates highly interpretable, visually grounded class concepts, enabling human-traceable, transparent reasoning—thereby bridging the gap between generative semantics and reliable visual recognition.
📝 Abstract
Zero-shot learning (ZSL) aims to recognize unseen classes by aligning images with intermediate class semantics, like human-annotated concepts or class definitions. An emerging alternative leverages Large-scale Language Models (LLMs) to automatically generate class documents. However, these methods often face challenges with transparency in the classification process and may suffer from the notorious hallucination problem in LLMs, resulting in non-visual class semantics. This paper redefines class semantics in ZSL with a focus on transferability and discriminability, introducing a novel framework called Zero-shot Learning with Infinite Class Concepts (InfZSL). Our approach leverages the powerful capabilities of LLMs to dynamically generate an unlimited array of phrase-level class concepts. To address the hallucination challenge, we introduce an entropy-based scoring process that incorporates a ``goodness"concept selection mechanism, ensuring that only the most transferable and discriminative concepts are selected. Our InfZSL framework not only demonstrates significant improvements on three popular benchmark datasets but also generates highly interpretable, image-grounded concepts. Code will be released upon acceptance.