Building Patient Journeys in Hebrew: A Language Model for Clinical Timeline Extraction

πŸ“… 2025-12-12
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Clinical timeline extraction from Hebrew electronic health records (EHRs) lacks domain-specific language models, hindering accurate temporal reasoning in clinical NLP. Method: We introduce the first continual pre-trained language model for Hebrew clinical textβ€”built upon DictaBERT 2.0β€”and propose a novel lexically adaptive tokenization strategy to enhance morphological processing of Hebrew. We construct a de-identified, temporally annotated clinical timeline dataset spanning two domains (internal/emergency medicine and oncology), empirically validating that de-identification preserves downstream performance while ensuring privacy compliance. Contribution/Results: Our model achieves state-of-the-art performance on two newly released Hebrew clinical timeline benchmark datasets. Both the model and dataset are publicly released under ethically reviewed, privacy-preserving protocols, supporting reproducible, compliant research in Hebrew clinical NLP.

Technology Category

Application Category

πŸ“ Abstract
We present a new Hebrew medical language model designed to extract structured clinical timelines from electronic health records, enabling the construction of patient journeys. Our model is based on DictaBERT 2.0 and continually pre-trained on over five million de-identified hospital records. To evaluate its effectiveness, we introduce two new datasets -- one from internal medicine and emergency departments, and another from oncology -- annotated for event temporal relations. Our results show that our model achieves strong performance on both datasets. We also find that vocabulary adaptation improves token efficiency and that de-identification does not compromise downstream performance, supporting privacy-conscious model development. The model is made available for research use under ethical restrictions.
Problem

Research questions and friction points this paper is trying to address.

Extracts structured clinical timelines from Hebrew health records
Constructs patient journeys using a specialized Hebrew medical language model
Evaluates model on temporal event datasets from multiple medical departments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hebrew medical language model for clinical timeline extraction
Continual pre-training on de-identified hospital records
Vocabulary adaptation enhances token efficiency and privacy
πŸ”Ž Similar Papers
No similar papers found.
K
Kai Golan Hashiloni
Efi Arazi School of Computer Science, Reichman University, Herzilya, Israel
B
Brenda Kasabe Nokai
Tel Aviv Sourasky Medical Center, Israel
M
Michal Shevach
Tel Aviv Sourasky Medical Center, Israel
E
Esthy Shemesh
Tel Aviv Sourasky Medical Center, Israel
R
Ronit Bartin
Tel Aviv Sourasky Medical Center, Israel
A
Anna Bergrin
Tel Aviv Sourasky Medical Center, Israel
L
Liran Harel
Tel Aviv Sourasky Medical Center, Israel
N
Nachum Dershowitz
School of Computer Science and AI, Tel Aviv University, Israel
L
Liat Nadai Arad
Tel Aviv Sourasky Medical Center, Israel
Kfir Bar
Kfir Bar
Efi Arazi School of Computer Science, Reichman University (IDC Herzliya)
Natural Language Processing