🤖 AI Summary
In low-resource settings, topic inference suffers from instability, incoherence, and degraded cross-domain adaptability due to the inadvertent incorporation of irrelevant knowledge. Method: This paper formally defines the low-resource cross-domain topic modeling task and proposes an adaptive topic model based on a shared-encoder–dual-decoder architecture coupled with adversarial latent space alignment. Theoretically, we derive a generalization bound that reveals the synergistic interplay among domain consistency, latent-space alignment, and robustness to overfitting. Methodologically, a shared encoder captures domain-invariant semantics; dual decoders separately model domain-specific characteristics for source and target domains; and adversarial training enforces fine-grained matching of latent distributions. Results: On multiple low-resource benchmarks, our approach improves topic coherence by 12.3%, stability by 18.7%, and cross-domain transferability—significantly outperforming state-of-the-art methods.
📝 Abstract
Topic modeling plays a vital role in uncovering hidden semantic structures within text corpora, but existing models struggle in low-resource settings where limited target-domain data leads to unstable and incoherent topic inference. We address this challenge by formally introducing domain adaptation for low-resource topic modeling, where a high-resource source domain informs a low-resource target domain without overwhelming it with irrelevant content. We establish a finite-sample generalization bound showing that effective knowledge transfer depends on robust performance in both domains, minimizing latent-space discrepancy, and preventing overfitting to the data. Guided by these insights, we propose DALTA (Domain-Aligned Latent Topic Adaptation), a new framework that employs a shared encoder for domain-invariant features, specialized decoders for domain-specific nuances, and adversarial alignment to selectively transfer relevant information. Experiments on diverse low-resource datasets demonstrate that DALTA consistently outperforms state-of-the-art methods in terms of topic coherence, stability, and transferability.