🤖 AI Summary
Clinical record variability in dementia diagnosis degrades Medicare claims data quality, leading to diminished diagnostic specificity over time—a phenomenon newly termed “diagnostic signal attenuation,” wherein specific codes (e.g., Alzheimer’s disease) are systematically replaced by nonspecific ones.
Method: Leveraging 2016–2018 U.S. Medicare inpatient data, we integrate clinical knowledge–driven 17-dimensional ICD-10 grouping, temporal sequential pattern mining (tSPM+), and matrix similarity analysis to quantify signal attenuation across spatiotemporal dimensions.
Contribution/Results: Nonspecific coding dominates dementia-related claims; significant temporal degradation is observed; our model explains 38% of geographic variation in attenuation and identifies rural areas, minority populations, and Medicaid-covered regions as high-attenuation zones. This work establishes a computationally tractable, interpretable paradigm for evaluating AI-ready healthcare data quality—bridging clinical semantics, temporal dynamics, and population-level disparities.
📝 Abstract
Background: Artificial intelligence (AI) models in healthcare depend on the fidelity of diagnostic data, yet the quality of such data is often compromised by variability in clinical documentation practices. In dementia, a condition already prone to diagnostic ambiguity, this variability may introduce systematic distortion into claims-based research and AI model development. Methods: We analyzed Medicare Part A hospitalization data from 2016-2018 to examine patterns of dementia-related ICD-10 code utilization across more than 3,000 U.S. counties. Using a clinically informed classification of 17 ICD-10 codes grouped into five diagnostic categories, we applied the transitive Sequential Pattern Mining (tSPM+) algorithm to model temporal usage structures. We then used matrix similarity methods to compare local diagnostic patterns to national norms and fit multivariable linear regressions to identify county-level demographic and structural correlates of divergence. Findings: We found substantial geographic and demographic variation in dementia-related diagnostic code usage. Non-specific codes were dominant nationwide, while Alzheimer's disease and vascular dementia codes showed pronounced variability. Temporal sequence analysis revealed consistent transitions from specific to non-specific codes, which suggest degradation of diagnostic specificity over time. Counties with higher proportions of rural residents, Medicaid-eligible patients, and Black or Hispanic dementia patients demonstrated significantly lower similarity to national usage patterns. Our model explained 38% of the variation in local-to-national diagnostic alignment.