LAUD: Integrating Large Language Models with Active Learning for Unlabeled Data

πŸ“… 2025-11-18
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the challenge of scarce labeled data in real-world scenarios, which hinders efficient deployment of large language models (LLMs), this paper proposes LAUDβ€”a framework integrating LLMs with active learning. LAUD initializes the label set via zero-shot inference to mitigate cold-start issues and iteratively performs query selection, LLM-assisted labeling, and active sampling to substantially reduce human annotation effort. Its core innovation lies in embedding the LLM as a dynamic annotator within the active learning loop, enabling unsupervised initialization and efficient iterative optimization. Evaluated on a product name classification task, LAUD achieves superior performance over pure zero-shot and few-shot baselines using only minimal human verification. Results demonstrate its effectiveness and practicality in enhancing model generalization and annotation efficiency under low-resource conditions.

Technology Category

Application Category

πŸ“ Abstract
Large language models (LLMs) have shown a remarkable ability to generalize beyond their pre-training data, and fine-tuning LLMs can elevate performance to human-level and beyond. However, in real-world scenarios, lacking labeled data often prevents practitioners from obtaining well-performing models, thereby forcing practitioners to highly rely on prompt-based approaches that are often tedious, inefficient, and driven by trial and error. To alleviate this issue of lacking labeled data, we present a learning framework integrating LLMs with active learning for unlabeled dataset (LAUD). LAUD mitigates the cold-start problem by constructing an initial label set with zero-shot learning. Experimental results show that LLMs derived from LAUD outperform LLMs with zero-shot or few-shot learning on commodity name classification tasks, demonstrating the effectiveness of LAUD.
Problem

Research questions and friction points this paper is trying to address.

Addresses the lack of labeled data for fine-tuning large language models
Reduces reliance on inefficient prompt-based trial-and-error methods
Solves the cold-start problem in unlabeled datasets using zero-shot learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrating LLMs with active learning for unlabeled data
Mitigating cold-start via zero-shot learning initialization
Outperforming zero-shot and few-shot learning approaches
πŸ”Ž Similar Papers
No similar papers found.