🤖 AI Summary
Existing cross-lingual topic models are often hindered by sparse bilingual resources, leading to incoherent topics or weak cross-lingual alignment. While recent approaches leveraging large language models (LLMs) offer improvements, they typically incur high computational costs, risk generating hallucinations, and rely on document-level processing or access to internal token probabilities. This work proposes LLM-XTM, a novel framework that operates under black-box conditions without requiring access to an LLM’s internal probability distributions. By employing prompt engineering to guide topic refinement and introducing a self-consistency mechanism to quantify uncertainty and suppress hallucinations, LLM-XTM significantly enhances both topic coherence and cross-lingual alignment across multilingual corpora. Moreover, the approach substantially reduces dependence on bilingual dictionaries and frequent LLM invocations, offering a more efficient and robust solution for cross-lingual topic modeling.
📝 Abstract
Cross-lingual topic modeling aims to discover shared semantic structures across languages, yet existing models depend on sparse bilingual resources and often yield incoherent or weakly aligned topics. Recent LLM-based refinements improve interpretability but are costly, document-level, and prone to hallucination, with prior white-box approaches requiring inaccessible token probabilities. We propose LLM-XTM, a framework that integrates LLM-guided topic refinement with self-consistency uncertainty quantification, enabling black-box, stable, and scalable enhancement of cross-lingual topic models. Experiments on multilingual corpora show that LLM-XTM achieves superior topic coherence and alignment while reducing reliance on bilingual dictionaries and expensive LLM calls.