Similarity-Based Domain Adaptation with LLMs

📅 2025-03-07

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

Unsupervised cross-domain text classification commonly relies on source-domain model training, incurring high computational overhead and significant adaptation latency. Method: This paper proposes the first LLM-driven, zero-training domain adaptation paradigm that eliminates the need for source-model training. It directly leverages large language models (LLMs) to generate high-quality pseudo-labels for unlabeled target-domain data and introduces a semantic-similarity-guided contrastive knowledge distillation loss—abandoning conventional feature-space alignment assumptions. The framework requires only a single LLM inference pass followed by lightweight distillation. Contribution/Results: Evaluated on multiple cross-domain text classification benchmarks, the method achieves a 2.44% absolute accuracy improvement over state-of-the-art approaches, demonstrating superior effectiveness and generalization while substantially reducing computational cost and deployment latency.

Technology Category

Application Category

📝 Abstract

Unsupervised domain adaptation leverages abundant labeled data from various source domains to generalize onto unlabeled target data. Prior research has primarily focused on learning domain-invariant features across the source and target domains. However, these methods often require training a model using source domain data, which is time-consuming and can limit model usage for applications with different source data. This paper introduces a simple framework that utilizes the impressive generalization capabilities of Large Language Models (LLMs) for target data annotation without the need of source model training, followed by a novel similarity-based knowledge distillation loss. Our extensive experiments on cross-domain text classification reveal that our framework achieves impressive performance, specifically, 2.44% accuracy improvement when compared to the SOTA method.

Problem

Research questions and friction points this paper is trying to address.

Leverages LLMs for domain adaptation without source model training

Introduces similarity-based knowledge distillation loss for better performance

Improves cross-domain text classification accuracy by 2.44%

Innovation

Methods, ideas, or system contributions that make the work stand out.

Utilizes LLMs for target data annotation

Eliminates need for source model training

Introduces similarity-based knowledge distillation loss

🔎 Similar Papers

No similar papers found.