Building Patient Journeys in Hebrew: A Language Model for Clinical Timeline Extraction

📅 2025-12-12

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

Clinical timeline extraction from Hebrew electronic health records (EHRs) lacks domain-specific language models, hindering accurate temporal reasoning in clinical NLP. Method: We introduce the first continual pre-trained language model for Hebrew clinical text—built upon DictaBERT 2.0—and propose a novel lexically adaptive tokenization strategy to enhance morphological processing of Hebrew. We construct a de-identified, temporally annotated clinical timeline dataset spanning two domains (internal/emergency medicine and oncology), empirically validating that de-identification preserves downstream performance while ensuring privacy compliance. Contribution/Results: Our model achieves state-of-the-art performance on two newly released Hebrew clinical timeline benchmark datasets. Both the model and dataset are publicly released under ethically reviewed, privacy-preserving protocols, supporting reproducible, compliant research in Hebrew clinical NLP.

Technology Category

Application Category

📝 Abstract

We present a new Hebrew medical language model designed to extract structured clinical timelines from electronic health records, enabling the construction of patient journeys. Our model is based on DictaBERT 2.0 and continually pre-trained on over five million de-identified hospital records. To evaluate its effectiveness, we introduce two new datasets -- one from internal medicine and emergency departments, and another from oncology -- annotated for event temporal relations. Our results show that our model achieves strong performance on both datasets. We also find that vocabulary adaptation improves token efficiency and that de-identification does not compromise downstream performance, supporting privacy-conscious model development. The model is made available for research use under ethical restrictions.

Problem

Research questions and friction points this paper is trying to address.

Extracts structured clinical timelines from Hebrew health records

Constructs patient journeys using a specialized Hebrew medical language model

Evaluates model on temporal event datasets from multiple medical departments

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hebrew medical language model for clinical timeline extraction

Continual pre-training on de-identified hospital records

Vocabulary adaptation enhances token efficiency and privacy

🔎 Similar Papers

No similar papers found.