Correcting heterogeneous diagnostic bias when developing clinical prediction models using causal hidden Markov models

📅 2026-05-07

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

Clinical prediction models often suffer from label bias due to disparities in diagnostic frequencies across subpopulations—such as those defined by sex, race, or diabetes status—leading to systematic prediction errors and distorted performance evaluation. This work proposes the first framework that integrates causal inference with a hidden Markov model to define a counterfactual target: the probability that an individual would be diagnosed under the reference group’s diagnostic rate. By explicitly modeling the latent disease progression process and observed testing outcomes, the method corrects for biases arising from differential diagnostic delays. In simulations, it reduces the Observed:Expected ratio for the previously underestimated group from 1.34 to 1.02. Applied to real-world chronic kidney disease data, it improves calibration for non-diabetic patients, lowering their ratio from 1.55 to 1.01.

📝 Abstract

In routine care, individuals identified a priori as high-risk are usually tested for conditions more frequently. Protected attributes, such as sex or ethnicity may also determine testing frequency. Such heterogeneous detection rates across a population induce label error. This causes systematic model error for specific groups and biases performance metrics during validation. This paper proposes a method to correct for such bias in prediction models due to differential diagnostic delay. We use a causal inference framework to define our target estimand: an individual's diagnosis probability in a counterfactual scenario where their diagnosis rate matches that of a reference group. We model the longitudinal process as a hidden Markov model, in which confirmatory test results are emissions from a latent progressive disease stage. We validate our approach in simulated data and apply it to a case study of chronic kidney disease prediction using electronic health records. In simulations, our method reduces prediction bias and improves calibration-in-the-large, correcting the Observed:Expected ratio in the underdiagnosed group from 1.34 (standard deviation: 0.09) in a model developed without any correction for underdiagnosis bias to 1.02 (0.09). Violations of assumptions in the simulation affected the estimation of model parameters, but the proposed approach nonetheless remained better calibrated than the standard model. In the clinical case study, we identify diabetes as the main driver of observability, with an odds ratio of 10.36 (95% confidence interval, 9.80 - 11.02) in 6-month urine albumin-creatinine ratio testing rate. Using our approach to predict the counterfactual diagnostic rate in patients without diabetes, we improved the Observed:Expected ratio of a developed clinical prediction model from 1.55 (1.51 - 1.59) to 1.01 (0.98 - 1.04).

Problem

Research questions and friction points this paper is trying to address.

diagnostic bias

label error

heterogeneous testing

clinical prediction models

underdiagnosis

Innovation

Methods, ideas, or system contributions that make the work stand out.

causal inference

hidden Markov model

diagnostic bias