🤖 AI Summary
This study addresses the significant performance degradation of clinical text section segmentation models when transferred across domains—such as from MIMIC-III to obstetrics—which limits their applicability in diverse clinical settings. To this end, the authors construct and publicly release the first annotated obstetric clinical notes dataset for section segmentation. They systematically evaluate both supervised Transformer-based models and zero-shot large language models (LLMs) on in-domain and cross-domain tasks. Results show that supervised models achieve strong in-domain performance but suffer substantial drops in cross-domain settings, whereas zero-shot LLMs, after correcting hallucinated section headers, demonstrate superior cross-domain adaptability. This work presents the first direct comparison between these two paradigms for clinical section segmentation, highlighting the potential of zero-shot approaches for structuring clinical text across heterogeneous domains.
📝 Abstract
Clinical free-text notes contain vital patient information. They are structured into labelled sections; recognizing these sections has been shown to support clinical decision-making and downstream NLP tasks. In this paper, we advance clinical section segmentation through three key contributions. First, we curate a new de-identified, section-labeled obstetrics notes dataset, to supplement the medical domains covered in public corpora such as MIMIC-III, on which most existing segmentation approaches are trained. Second, we systematically evaluate transformer-based supervised models for section segmentation on a curated subset of MIMIC-III (in-domain), and on the new obstetrics dataset (out-of-domain). Third, we conduct the first head-to-head comparison of supervised models for medical section segmentation with zero-shot large language models. Our results show that while supervised models perform strongly in-domain, their performance drops substantially out-of-domain. In contrast, zero-shot models demonstrate robust out-of-domain adaptability once hallucinated section headers are corrected. These findings underscore the importance of developing domain-specific clinical resources and highlight zero-shot segmentation as a promising direction for applying healthcare NLP beyond well-studied corpora, as long as hallucinations are appropriately managed.