🤖 AI Summary
To address the challenges of identifying sparse multi-organ lesions in Hebrew-language radiology reports for Crohn’s disease and poor structured information extraction performance in low-resource languages, this paper proposes HSMP-BERT, a hierarchical prompt-learning model. It integrates hierarchical multi-label classification with cross-organ–pathology fine-grained matching and supports joint zero-shot and full-finetuning training. Evaluated on 24 organ–finding combinations, HSMP-BERT achieves an average F1 score of 0.83—significantly outperforming baselines—while accelerating inference by 5.1×. We establish a multidimensional evaluation framework incorporating F1, Cohen’s κ, and AUC, enabling discovery of key clinical associations (e.g., ileal wall thickening and stenosis) and demographic trends (e.g., age and sex distributions). This work provides a scalable technical foundation for large-scale epidemiological studies of gastrointestinal diseases in low-resource language settings.
📝 Abstract
Extracting structured clinical information from radiology reports is challenging, especially in low-resource languages. This is pronounced in Crohn's disease, with sparsely represented multi-organ findings. We developed Hierarchical Structured Matching Prediction BERT (HSMP-BERT), a prompt-based model for extraction from Hebrew radiology text. In an administrative database study, we analyzed 9,683 reports from Crohn's patients imaged 2010-2023 across Israeli providers. A subset of 512 reports was radiologist-annotated for findings across six gastrointestinal organs and 15 pathologies, yielding 90 structured labels per subject. Multilabel-stratified split (66% train+validation; 33% test), preserving label prevalence. Performance was evaluated with accuracy, F1, Cohen's $κ$, AUC, PPV, NPV, and recall. On 24 organ-finding combinations with $>$15 positives, HSMP-BERT achieved mean F1 0.83$pm$0.08 and $κ$ 0.65$pm$0.17, outperforming the SMP zero-shot baseline (F1 0.49$pm$0.07, $κ$ 0.06$pm$0.07) and standard fine-tuning (F1 0.30$pm$0.27, $κ$ 0.27$pm$0.34; paired t-test $p < 10^{-7}$). Hierarchical inference cuts runtime 5.1$ imes$ vs. traditional inference. Applied to all reports, it revealed associations among ileal wall thickening, stenosis, and pre-stenotic dilatation, plus age- and sex-specific trends in inflammatory findings. HSMP-BERT offers a scalable solution for structured extraction in radiology, enabling population-level analysis of Crohn's disease and demonstrating AI's potential in low-resource settings.