Learning Normal Representations for Blood Biomarkers

📅 2026-05-18
📈 Citations: 0
Influential: 0
📄 PDF

career value

187K/year
🤖 AI Summary
This study addresses the limitations of conventional blood biomarker interpretation, which relies on fixed population-based reference intervals and overlooks individual baseline variability, often leading to missed diagnoses. Purely personalized approaches, while accounting for individuality, suffer from data sparsity and tend to overfit, resulting in high false-positive rates. To reconcile these issues, this work proposes NORMA, a novel framework that uniquely integrates population-level priors of “normal” biological variation into a conditional Transformer architecture, enabling joint modeling of individual longitudinal trajectories and population-level patterns. Leveraging nearly two billion multi-regional longitudinal laboratory records, NORMA significantly enhances prediction accuracy for critical clinical outcomes—including mortality, acute kidney injury, and chronic diseases—while substantially reducing false alerts for anomalous values, all without excessive personalization. The model and accompanying interactive tools are publicly released.
📝 Abstract
Blood-based biomarkers underpin clinical diagnosis and management, yet their interpretation relies largely on fixed population reference intervals that ignore stable, intra-patient variability. As such, population-based interpretation can mask meaningful deviation from an individual's baseline, risking delayed disease detection. To remedy this, there have been increasing efforts to personalize blood biomarker interpretation using individual testing histories. However, these methods may overfit to sparse data, inflating false-positive rates and unnecessary follow-up, and can also unwittingly include unrecognized or subclinical disease. Here, we leverage nearly 2 billion longitudinal laboratory measurements from over 1.6 million individuals across North America, the Middle East, and East Asia, to show that while laboratory values are highly individual, purely personalized intervals routinely overfit, classifying up to 68% of measurements as abnormal, without corresponding associations with adverse clinical outcomes. We then introduce NORMA, a conditional transformer-based framework that generates reference intervals by conditioning on both a patient's history and population-level data about "normal" variation. NORMA-derived intervals achieve higher precision for predicting outcomes, including mortality, acute kidney injury, and chronic disease. These findings caution against over-personalization in laboratory medicine and demonstrate that anchoring individual trajectories to population-level priors outperforms either approach alone. To promote transparency, we publicly release the model, code, and an interactive user interface for accessible, individualized laboratory interpretation.
Problem

Research questions and friction points this paper is trying to address.

blood biomarkers
reference intervals
personalized interpretation
overfitting
intra-patient variability
Innovation

Methods, ideas, or system contributions that make the work stand out.

personalized reference intervals
conditional transformer
longitudinal biomarker analysis
overfitting mitigation
population priors
🔎 Similar Papers
No similar papers found.
A
Aashna P. Shah
Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
Michelle M. Li
Michelle M. Li
Research Fellow, Harvard Medical School
Y
Yash Lal
Department of Mathematics, Johns Hopkins University, Baltimore, MD, USA
Seffi Cohen
Seffi Cohen
Ben Gurion University
AIMLEnsemble Methods
L
Liat F. Antwarg
Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
M
Morgan Sanchez
Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
J
James A. Diao
Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
C
Chirag J. Patel
Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
B
Ben Y. Reis
Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
R
Ran D. Balicer
The Ivan and Francesca Berkowitz Family Living Laboratory Collaboration at Harvard Medical School and Clalit Research Institute, USA and Israel
Noa Dagan
Noa Dagan
Clalit Research Institute and Ben-Gurion University, Israel
Clinical prediction modelsCausal inferenceAlgorithmic fairness
A
Arjun K. Manrai
Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA