π€ AI Summary
To address the challenge of scarce labeled data in real-world scenarios, which hinders efficient deployment of large language models (LLMs), this paper proposes LAUDβa framework integrating LLMs with active learning. LAUD initializes the label set via zero-shot inference to mitigate cold-start issues and iteratively performs query selection, LLM-assisted labeling, and active sampling to substantially reduce human annotation effort. Its core innovation lies in embedding the LLM as a dynamic annotator within the active learning loop, enabling unsupervised initialization and efficient iterative optimization. Evaluated on a product name classification task, LAUD achieves superior performance over pure zero-shot and few-shot baselines using only minimal human verification. Results demonstrate its effectiveness and practicality in enhancing model generalization and annotation efficiency under low-resource conditions.
π Abstract
Large language models (LLMs) have shown a remarkable ability to generalize beyond their pre-training data, and fine-tuning LLMs can elevate performance to human-level and beyond. However, in real-world scenarios, lacking labeled data often prevents practitioners from obtaining well-performing models, thereby forcing practitioners to highly rely on prompt-based approaches that are often tedious, inefficient, and driven by trial and error. To alleviate this issue of lacking labeled data, we present a learning framework integrating LLMs with active learning for unlabeled dataset (LAUD). LAUD mitigates the cold-start problem by constructing an initial label set with zero-shot learning. Experimental results show that LLMs derived from LAUD outperform LLMs with zero-shot or few-shot learning on commodity name classification tasks, demonstrating the effectiveness of LAUD.