🤖 AI Summary
To address the deployment bottleneck of conventional machine learning models in data-scarce scenarios—such as emerging domains or urgent situations (e.g., early pandemic stages)—this paper presents a systematic survey of few-shot learning (FSL) across audio, image, text, and multimodal domains. We conduct the first cross-modal comparative analysis, identifying shared challenges (e.g., high annotation cost, modality heterogeneity) and domain-specific requirements (e.g., temporal modeling for audio, semantic alignment for text). We propose a domain-sensitive method selection framework that unifies meta-learning, metric learning, data augmentation, prompt-based fine-tuning, and multimodal alignment techniques, explicitly characterizing their applicability boundaries and failure modes. Our taxonomy encompasses 120+ studies and prescribes optimal practice pathways per domain. The survey significantly enhances the practical feasibility of FSL in low-data applications, including medical diagnosis and edge computing.
📝 Abstract
In a world where new domains are constantly discovered and machine learning (ML) is applied to automate new tasks every day, challenges arise with the number of samples available to train ML models. While the traditional ML training relies heavily on data volume, finding a large dataset with a lot of usable samples is not always easy, and often the process takes time. For instance, when a new human transmissible disease such as COVID-19 breaks out and there is an immediate surge for rapid diagnosis, followed by rapid isolation of infected individuals from healthy ones to contain the spread, there is an immediate need to create tools/automation using machine learning models. At the early stage of an outbreak, it is not only difficult to obtain a lot of samples, but also difficult to understand the details about the disease, to process the data needed to train a traditional ML model. A solution for this can be a few-shot learning approach. This paper presents challenges and opportunities of few-shot approaches that vary across major domains, i.e., audio, image, text, and their combinations, with their strengths and weaknesses. This detailed understanding can help to adopt appropriate approaches applicable to different domains and applications.