How Useful is Continued Pre-Training for Generative Unsupervised Domain Adaptation?

📅 2024-01-31
🏛️ Workshop on Representation Learning for NLP
📈 Citations: 3
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the effectiveness and underlying mechanisms of continual pretraining (CPT) in generative unsupervised domain adaptation (UDA). Addressing the gap that existing UDA research focuses predominantly on discriminative methods while generative UDA remains underexplored, we present the first systematic evaluation of CPT for generative UDA. We propose a CPT paradigm grounded in masked language modeling (MLM), integrated with domain-invariant representation learning. Through extensive ablation studies across diverse model architectures, fine-tuning strategies, and data scales, we demonstrate its robust generalizability. Results show that CPT substantially improves target-domain classification accuracy. Its core mechanism lies in implicitly acquiring downstream classification capability by predicting task-informative masked tokens during MLM. Moreover, we theoretically establish consistency between CPT and instruction tuning in terms of task-guided representation learning, revealing a unified principle for effective adaptation.

Technology Category

Application Category

📝 Abstract
Recent breakthroughs in scale have enabled the emergence of powerful generative language models, and the ability to fine-tune these models on various tasks by casting them into prompts or instructions. In this landscape, the problem of Unsupervised Domain Adaptation (UDA), or the problem of leveraging knowledge from a labeled source domain to an unlabeled target domain, has been left behind, with recent UDA methods still addressing discriminative classification. In particular, two popular UDA approaches, involving Continued Pre-Training (CPT) and learning domain invariant representations, have been under-explored in the generative setting, signaling a gap. In this work, we evaluate the utility of CPT for generative UDA. We first perform an empirical evaluation to measure the trade-offs between CPT and strong methods promoting domain invariance. We further evaluate how well the benefits of CPT extend to different architectures, tuning methods and data regimes. We then motivate the use of CPT by studying to what degree it benefits classification performance on the target domain. Finally, we attempt to understand the mechanism behind which CPT improves classification performance on the unlabeled target domain. Our findings suggest that a implicitly learns the downstream task while predicting masked words informative to that task. Our work connects the body of UDA research with that of instruction tuning, enabling an initial step towards a wider applicability of modern language models.
Problem

Research questions and friction points this paper is trying to address.

Evaluates Continued Pre-Training for generative Unsupervised Domain Adaptation.
Explores trade-offs between CPT and domain invariance methods.
Investigates CPT's impact on classification in unlabeled target domains.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluates Continued Pre-Training for generative UDA
Compares CPT with domain invariance methods
Explores CPT benefits across architectures and data
R
Rheeya Uppaal
Department of Computer Sciences, University of Wisconsin-Madison
Y
Yixuan Li
Department of Computer Sciences, University of Wisconsin-Madison
J
Junjie Hu
Department of Computer Sciences and Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison