FungalZSL: Zero-Shot Fungal Classification with Image Captioning Using a Synthetic Data Approach

📅 2025-02-26

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

Fungal image zero-shot classification faces challenges due to scarcity of real annotated data and difficulty in semantic alignment across growth stages. Method: This paper proposes a growth-stage-aware synthetic data generation paradigm: fine-grained textual descriptions are generated using LLaMA3.2, then paired fungal images are synthesized via controllable image generation; both modalities are aligned within CLIP’s shared embedding space. Crucially, growth-stage knowledge is explicitly encoded as a prior constraint guiding text–image co-generation. Contribution/Results: To our knowledge, this is the first work to incorporate explicit growth-stage semantics into multimodal synthetic data construction. We systematically evaluate how LLM-generated text quality affects cross-stage knowledge transfer. Experiments demonstrate significant improvements in CLIP’s zero-shot classification accuracy—particularly for early growth stages—establishing a scalable, interpretable framework for few-shot biological image recognition.

Technology Category

Application Category

📝 Abstract

The effectiveness of zero-shot classification in large vision-language models (VLMs), such as Contrastive Language-Image Pre-training (CLIP), depends on access to extensive, well-aligned text-image datasets. In this work, we introduce two complementary data sources, one generated by large language models (LLMs) to describe the stages of fungal growth and another comprising a diverse set of synthetic fungi images. These datasets are designed to enhance CLIPs zero-shot classification capabilities for fungi-related tasks. To ensure effective alignment between text and image data, we project them into CLIPs shared representation space, focusing on different fungal growth stages. We generate text using LLaMA3.2 to bridge modality gaps and synthetically create fungi images. Furthermore, we investigate knowledge transfer by comparing text outputs from different LLM techniques to refine classification across growth stages.

Problem

Research questions and friction points this paper is trying to address.

Improve zero-shot fungal classification

Leverage synthetic data for CLIP

Align text-image datasets effectively

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs generate fungal growth descriptions

Synthetic fungi images enhance CLIP

Align text-image in shared space

🔎 Similar Papers

FungiTastic: A multi-modal dataset and benchmark for image categorization