🤖 AI Summary
Traditional topic models suffer from degraded performance on medical texts due to sparse topics—those with few associated documents. To address this low-resource challenge, we propose ProtoTopic, the first few-shot medical topic modeling method leveraging prototype networks. ProtoTopic constructs learnable semantic prototypes and performs topic assignment and generation via metric learning, measuring document–prototype distances. Its key contributions are: (1) modeling resource-constrained medical topics using prototypes that jointly preserve semantic coherence and diversity; and (2) enabling interpretable, annotation-free topic inference with strong cross-domain generalizability. Extensive experiments across multiple medical corpora demonstrate that ProtoTopic significantly outperforms state-of-the-art baselines in both topic coherence and distinctiveness—validating its effectiveness in discovering high-quality, clinically meaningful topics from limited data.
📝 Abstract
Topic modeling is a useful tool for analyzing large corpora of written documents, particularly academic papers. Despite a wide variety of proposed topic modeling techniques, these techniques do not perform well when applied to medical texts. This can be due to the low number of documents available for some topics in the healthcare domain. In this paper, we propose ProtoTopic, a prototypical network-based topic model used for topic generation for a set of medical paper abstracts. Prototypical networks are efficient, explainable models that make predictions by computing distances between input datapoints and a set of prototype representations, making them particularly effective in low-data or few-shot learning scenarios. With ProtoTopic, we demonstrate improved topic coherence and diversity compared to two topic modeling baselines used in the literature, demonstrating the ability of our model to generate medically relevant topics even with limited data.