๐ค AI Summary
Unsupervised cross-domain text classification commonly relies on source-domain model training, incurring high computational overhead and significant adaptation latency. Method: This paper proposes the first LLM-driven, zero-training domain adaptation paradigm that eliminates the need for source-model training. It directly leverages large language models (LLMs) to generate high-quality pseudo-labels for unlabeled target-domain data and introduces a semantic-similarity-guided contrastive knowledge distillation lossโabandoning conventional feature-space alignment assumptions. The framework requires only a single LLM inference pass followed by lightweight distillation. Contribution/Results: Evaluated on multiple cross-domain text classification benchmarks, the method achieves a 2.44% absolute accuracy improvement over state-of-the-art approaches, demonstrating superior effectiveness and generalization while substantially reducing computational cost and deployment latency.
๐ Abstract
Unsupervised domain adaptation leverages abundant labeled data from various source domains to generalize onto unlabeled target data. Prior research has primarily focused on learning domain-invariant features across the source and target domains. However, these methods often require training a model using source domain data, which is time-consuming and can limit model usage for applications with different source data. This paper introduces a simple framework that utilizes the impressive generalization capabilities of Large Language Models (LLMs) for target data annotation without the need of source model training, followed by a novel similarity-based knowledge distillation loss. Our extensive experiments on cross-domain text classification reveal that our framework achieves impressive performance, specifically, 2.44% accuracy improvement when compared to the SOTA method.