LLM-XTM: Enhancing Cross-Lingual Topic Models with Large Language Models

📅 2026-05-04

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

Existing cross-lingual topic models are often hindered by sparse bilingual resources, leading to incoherent topics or weak cross-lingual alignment. While recent approaches leveraging large language models (LLMs) offer improvements, they typically incur high computational costs, risk generating hallucinations, and rely on document-level processing or access to internal token probabilities. This work proposes LLM-XTM, a novel framework that operates under black-box conditions without requiring access to an LLM’s internal probability distributions. By employing prompt engineering to guide topic refinement and introducing a self-consistency mechanism to quantify uncertainty and suppress hallucinations, LLM-XTM significantly enhances both topic coherence and cross-lingual alignment across multilingual corpora. Moreover, the approach substantially reduces dependence on bilingual dictionaries and frequent LLM invocations, offering a more efficient and robust solution for cross-lingual topic modeling.

📝 Abstract

Cross-lingual topic modeling aims to discover shared semantic structures across languages, yet existing models depend on sparse bilingual resources and often yield incoherent or weakly aligned topics. Recent LLM-based refinements improve interpretability but are costly, document-level, and prone to hallucination, with prior white-box approaches requiring inaccessible token probabilities. We propose LLM-XTM, a framework that integrates LLM-guided topic refinement with self-consistency uncertainty quantification, enabling black-box, stable, and scalable enhancement of cross-lingual topic models. Experiments on multilingual corpora show that LLM-XTM achieves superior topic coherence and alignment while reducing reliance on bilingual dictionaries and expensive LLM calls.

Problem

Research questions and friction points this paper is trying to address.

cross-lingual topic modeling

bilingual resources

topic coherence

topic alignment

large language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

cross-lingual topic modeling

large language models

black-box refinement