A Multi-Dimensional Clustering Approach for Identifying Inborn Errors of Immunity

πŸ“… 2026-05-15
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

199K/year
πŸ€– AI Summary
This study addresses the diagnostic delays in inborn errors of immunity (IEI) caused by the sparsity and structural complexity of electronic health record (EHR) data, which hinder effective data-driven analysis. To overcome this challenge, the authors propose an innovative pipeline that standardizes EHR data from national registries and transforms it into vector representations. By integrating feature engineering with unsupervised clustering algorithms optimized through hyperparameter tuning, the method enables automatic identification of multidimensional phenotypic patterns in IEI. The approach not only establishes a structured data framework tailored for rare disease analysis but also introduces a dedicated toolkit that substantially enhances the discovery of potential IEI subtypes. This work provides a scalable, data-driven solution to support early screening and research in rare diseases.
πŸ“ Abstract
Rare diseases such as inborn errors of immunity (IEI) require early diagnosis to prevent end organ damage and improve quality of life. Hurdles in accessing and curating large scale electronic health record (EHR) data limit routine data driven analyses to remain on the forefront of IEI and other rare disease trends. Development of machine learning (ML) algorithms in IEI for pattern recognition as well as published methodology examining how to systematically process and integrate complex medical data is limited. Our proposed pipeline, including data curation and ML clustering algorithms, is designed to recognize novel rare disease patterns and extract IEI- associated features from a national data registry. Our methodology for EHR data formatting and processing presents the pipeline that transforms raw immunologic lab data into vectors. This is further combined with hyperparameter tuning for diseases pattern recognition via clustering. This study refines IEI feature awareness, develops data tool kits for rare disease populations analysis, and expands on transforming complex medical records in data structures interpretable by unsupervised ML.
Problem

Research questions and friction points this paper is trying to address.

inborn errors of immunity
rare diseases
electronic health records
pattern recognition
data curation
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-dimensional clustering
electronic health records (EHR)
inborn errors of immunity (IEI)
unsupervised machine learning
data curation pipeline
πŸ”Ž Similar Papers
No similar papers found.
N
Nishad Kulkarni
Sheikh Zayed Institute for Pediatric Surgical Innovation, Children’s National Hospital, Washington, DC 20010
A
Alexandra K. Martinson
Childrens National Hospital, Washington, DC; Division of Allergy & Immunology Childrens National Hospital, Washington, DC; School of Medicine and Health Sciences, George Washington University, Washington, DC 20052
N
Nicholas L. Rider
Department of Health Systems & Implementation Science, Division of Allergy & Immunology, Virginia Tech Carilion School of Medicine, Roanoke, VA
M
Michael Keller
Childrens National Hospital, Washington, DC; Division of Allergy & Immunology Childrens National Hospital, Washington, DC; School of Medicine and Health Sciences, George Washington University, Washington, DC 20052
Syed Muhammad Anwar
Syed Muhammad Anwar
Childrens National Hospital/George Washington University
Biomedical Signal processingmedical image analysisgraph learningself-supervised learning