LLM Reading Tea Leaves: Automatically Evaluating Topic Models with Large Language Models

📅 2024-06-13
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Existing topic model evaluation metrics—such as perplexity—suffer from fundamental limitations: poor cross-model comparability, unidimensionality, and misalignment with human judgments, hindering comprehensive quality assessment. To address these issues, we propose WALM—the first LLM-based *joint* evaluation framework for topic models. WALM simultaneously quantifies both *topic semantic quality* and *document representation quality*, overcoming the dimensional fragmentation inherent in conventional metrics. It achieves semantic alignment through word-level consistency modeling and carefully engineered, task-specific prompting. Extensive experiments demonstrate that WALM achieves strong agreement with human annotations (Spearman ρ > 0.85) across diverse topic models, significantly outperforming existing metrics while enabling reliable, cross-model comparisons. The implementation, including code and a ready-to-use toolkit, is publicly released.

Technology Category

Application Category

📝 Abstract
Topic modeling has been a widely used tool for unsupervised text analysis. However, comprehensive evaluations of a topic model remain challenging. Existing evaluation methods are either less comparable across different models (e.g., perplexity) or focus on only one specific aspect of a model (e.g., topic quality or document representation quality) at a time, which is insufficient to reflect the overall model performance. In this paper, we propose WALM (Word Agreement with Language Model), a new evaluation method for topic modeling that considers the semantic quality of document representations and topics in a joint manner, leveraging the power of Large Language Models (LLMs). With extensive experiments involving different types of topic models, WALM is shown to align with human judgment and can serve as a complementary evaluation method to the existing ones, bringing a new perspective to topic modeling. Our software package is available at https://github.com/Xiaohao-Yang/Topic_Model_Evaluation.
Problem

Research questions and friction points this paper is trying to address.

Topic Model Evaluation
Comparative Analysis
Comprehensive Assessment
Innovation

Methods, ideas, or system contributions that make the work stand out.

WALM
Semantic Quality Evaluation
Large Language Model Integration
🔎 Similar Papers
No similar papers found.
Xiaohao Yang
Xiaohao Yang
Google
Pair Distribution FunctionX-rayDiffraction
H
He Zhao
CSIRO’s Data61, Sydney, Australia
D
Dinh Q. Phung
Monash University, Melbourne, Australia
W
W. Buntine
VinUniversity, Hanoi, Vietnam
L
Lan Du
Monash University, Melbourne, Australia