HAMLET: Healthcare-focused Adaptive Multilingual Learning Embedding-based Topic Modeling

📅 2025-05-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional topic models struggle to capture polysemy, low-frequency terms, and subtle contextual distinctions in cross-lingual clinical texts, resulting in poor topic coherence and limited interpretability; conversely, LLM-generated topics often suffer from redundancy and insufficient semantic representativeness. To address these limitations, we propose HAMLET—a graph-driven cross-lingual medical topic modeling framework. HAMLET first leverages an LLM to generate initial topics, then constructs a document–topic–word heterogeneous graph, integrating BERT/SBERT semantic embeddings with graph neural networks (GNNs) for neural-enhanced semantic fusion. A novel graph-aware similarity metric is further introduced to optimize topic embeddings. Evaluated on English/French bilingual clinical datasets, HAMLET achieves significant improvements in topic coherence (+12.3%), discriminability, and cross-lingual consistency, demonstrating its effectiveness and robustness in multilingual healthcare settings.

Technology Category

Application Category

📝 Abstract
Traditional topic models often struggle with contextual nuances and fail to adequately handle polysemy and rare words. This limitation typically results in topics that lack coherence and quality. Large Language Models (LLMs) can mitigate this issue by generating an initial set of topics. However, these raw topics frequently lack refinement and representativeness, which leads to redundancy without lexical similarity and reduced interpretability. This paper introduces HAMLET, a graph-driven architecture for cross-lingual healthcare topic modeling that uses LLMs. The proposed approach leverages neural-enhanced semantic fusion to refine the embeddings of topics generated by the LLM. Instead of relying solely on statistical co-occurrence or human interpretation to extract topics from a document corpus, this method introduces a topic embedding refinement that uses Bidirectional Encoder Representations from Transformers (BERT) and Graph Neural Networks (GNN). After topic generation, a hybrid technique that involves BERT and Sentence-BERT (SBERT) is employed for embedding. The topic representations are further refined using a GNN, which establishes connections between documents, topics, words, similar topics, and similar words. A novel method is introduced to compute similarities. Consequently, the topic embeddings are refined, and the top k topics are extracted. Experiments were conducted using two healthcare datasets, one in English and one in French, from which six sets were derived. The results demonstrate the effectiveness of HAMLET.
Problem

Research questions and friction points this paper is trying to address.

Traditional topic models lack coherence and quality in healthcare contexts
LLM-generated topics need refinement to reduce redundancy and improve interpretability
Cross-lingual healthcare topic modeling requires enhanced semantic fusion techniques
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses BERT and GNN for topic embedding refinement
Leverages neural-enhanced semantic fusion for coherence
Hybrid BERT-SBERT technique for cross-lingual modeling
Hajar Sakai
Hajar Sakai
Ph.D. in Industrial and Systems Engineering
Large Language ModelsText ClassificationTime Series Forecasting
S
Sarah S. Lam
School of Systems Science and Industrial Engineering, State University of University at Binghamton