Retrieval-aligned Tabular Foundation Models Enable Robust Clinical Risk Prediction in Electronic Health Records Under Real-world Constraints

📅 2026-04-02

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

This study addresses key challenges in clinical risk prediction from electronic health records (EHRs)—including high-dimensional heterogeneous data, extreme class imbalance, and cross-cohort distribution shifts—that limit the effectiveness of existing tabular in-context learning (TICL) approaches. The authors establish a multi-cohort EHR benchmark to systematically evaluate model performance across data scale, feature dimensionality, and generalization capability. They propose AWARE, a task-aligned retrieval framework that integrates supervised embedding learning with lightweight adapters to enhance retrieval quality and inference alignment. Experiments demonstrate that AWARE improves AUPRC by up to 12.2% under extreme class imbalance, with gains amplifying as data complexity increases. This work is the first to identify critical bottlenecks of TICL in clinical prediction and offers an effective solution to overcome them.

Technology Category

Application Category

📝 Abstract

Clinical prediction from structured electronic health records (EHRs) is challenging due to high dimensionality, heterogeneity, class imbalance, and distribution shift. While tabular in-context learning (TICL) and retrieval-augmented methods perform well on generic benchmarks, their behavior in clinical settings remains unclear. We present a multi-cohort EHR benchmark comparing classical, deep tabular, and TICL models across varying data scale, feature dimensionality, outcome rarity, and cross-cohort generalization. PFN-based TICL models are sample-efficient in low-data regimes but degrade under naive distance-based retrieval as heterogeneity and imbalance increase. We propose AWARE, a task-aligned retrieval framework using supervised embedding learning and lightweight adapters. AWARE improves AUPRC by up to 12.2% under extreme imbalance, with gains increasing with data complexity. Our results identify retrieval quality and retrieval-inference alignment as key bottlenecks for deploying tabular in-context learning in clinical prediction.

Problem

Research questions and friction points this paper is trying to address.

clinical risk prediction

electronic health records

tabular foundation models

class imbalance

distribution shift

Innovation

Methods, ideas, or system contributions that make the work stand out.

retrieval-aligned

tabular foundation models

in-context learning