LLM-XTM: Enhancing Cross-Lingual Topic Models with Large Language Models

📅 2026-05-04
📈 Citations: 0
Influential: 0
📄 PDF

career value

202K/year
🤖 AI Summary
Existing cross-lingual topic models are often hindered by sparse bilingual resources, leading to incoherent topics or weak cross-lingual alignment. While recent approaches leveraging large language models (LLMs) offer improvements, they typically incur high computational costs, risk generating hallucinations, and rely on document-level processing or access to internal token probabilities. This work proposes LLM-XTM, a novel framework that operates under black-box conditions without requiring access to an LLM’s internal probability distributions. By employing prompt engineering to guide topic refinement and introducing a self-consistency mechanism to quantify uncertainty and suppress hallucinations, LLM-XTM significantly enhances both topic coherence and cross-lingual alignment across multilingual corpora. Moreover, the approach substantially reduces dependence on bilingual dictionaries and frequent LLM invocations, offering a more efficient and robust solution for cross-lingual topic modeling.
📝 Abstract
Cross-lingual topic modeling aims to discover shared semantic structures across languages, yet existing models depend on sparse bilingual resources and often yield incoherent or weakly aligned topics. Recent LLM-based refinements improve interpretability but are costly, document-level, and prone to hallucination, with prior white-box approaches requiring inaccessible token probabilities. We propose LLM-XTM, a framework that integrates LLM-guided topic refinement with self-consistency uncertainty quantification, enabling black-box, stable, and scalable enhancement of cross-lingual topic models. Experiments on multilingual corpora show that LLM-XTM achieves superior topic coherence and alignment while reducing reliance on bilingual dictionaries and expensive LLM calls.
Problem

Research questions and friction points this paper is trying to address.

cross-lingual topic modeling
bilingual resources
topic coherence
topic alignment
large language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

cross-lingual topic modeling
large language models
black-box refinement
self-consistency uncertainty
topic coherence