Learning Normal Representations for Blood Biomarkers

📅 2026-05-18

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

This study addresses the limitations of conventional blood biomarker interpretation, which relies on fixed population-based reference intervals and overlooks individual baseline variability, often leading to missed diagnoses. Purely personalized approaches, while accounting for individuality, suffer from data sparsity and tend to overfit, resulting in high false-positive rates. To reconcile these issues, this work proposes NORMA, a novel framework that uniquely integrates population-level priors of “normal” biological variation into a conditional Transformer architecture, enabling joint modeling of individual longitudinal trajectories and population-level patterns. Leveraging nearly two billion multi-regional longitudinal laboratory records, NORMA significantly enhances prediction accuracy for critical clinical outcomes—including mortality, acute kidney injury, and chronic diseases—while substantially reducing false alerts for anomalous values, all without excessive personalization. The model and accompanying interactive tools are publicly released.

📝 Abstract

Blood-based biomarkers underpin clinical diagnosis and management, yet their interpretation relies largely on fixed population reference intervals that ignore stable, intra-patient variability. As such, population-based interpretation can mask meaningful deviation from an individual's baseline, risking delayed disease detection. To remedy this, there have been increasing efforts to personalize blood biomarker interpretation using individual testing histories. However, these methods may overfit to sparse data, inflating false-positive rates and unnecessary follow-up, and can also unwittingly include unrecognized or subclinical disease. Here, we leverage nearly 2 billion longitudinal laboratory measurements from over 1.6 million individuals across North America, the Middle East, and East Asia, to show that while laboratory values are highly individual, purely personalized intervals routinely overfit, classifying up to 68% of measurements as abnormal, without corresponding associations with adverse clinical outcomes. We then introduce NORMA, a conditional transformer-based framework that generates reference intervals by conditioning on both a patient's history and population-level data about "normal" variation. NORMA-derived intervals achieve higher precision for predicting outcomes, including mortality, acute kidney injury, and chronic disease. These findings caution against over-personalization in laboratory medicine and demonstrate that anchoring individual trajectories to population-level priors outperforms either approach alone. To promote transparency, we publicly release the model, code, and an interactive user interface for accessible, individualized laboratory interpretation.

Problem

Research questions and friction points this paper is trying to address.

blood biomarkers

reference intervals

personalized interpretation

overfitting

intra-patient variability

Innovation

Methods, ideas, or system contributions that make the work stand out.

personalized reference intervals

conditional transformer

longitudinal biomarker analysis