Schema-Adaptive Tabular Representation Learning with LLMs for Generalizable Multimodal Clinical Reasoning

📅 2026-04-12
📈 Citations: 0
Influential: 0
📄 PDF

career value

185K/year
🤖 AI Summary
This study addresses the generalization bottleneck in dementia diagnosis arising from insufficient semantic understanding of tabular data across heterogeneous electronic health record (EHR) schemas. To overcome this challenge, the authors propose a large language model (LLM)-based zero-shot tabular representation method that automatically converts structured clinical variables into natural language descriptions, generating transferable table embeddings. These embeddings are then fused with MRI data to construct a multimodal diagnostic framework. Notably, this approach achieves zero-shot alignment across disparate EHR schemas without manual feature engineering or model retraining. Evaluated on the NACC and ADNI datasets, the method significantly outperforms clinical baselines—including board-certified neurologists—demonstrating the potential of LLMs to enhance structured clinical reasoning in real-world, heterogeneous healthcare settings.

Technology Category

Application Category

📝 Abstract
Machine learning for tabular data remains constrained by poor schema generalization, a challenge rooted in the lack of semantic understanding of structured variables. This challenge is particularly acute in domains like clinical medicine, where electronic health record (EHR) schemas vary significantly. To solve this problem, we propose Schema-Adaptive Tabular Representation Learning, a novel method that leverages large language models (LLMs) to create transferable tabular embeddings. By transforming structured variables into semantic natural language statements and encoding them with a pretrained LLM, our approach enables zero-shot alignment across unseen schemas without manual feature engineering or retraining. We integrate our encoder into a multimodal framework for dementia diagnosis, combining tabular and MRI data. Experiments on NACC and ADNI datasets demonstrate state-of-the-art performance and successful zero-shot transfer to unseen schemas, significantly outperforming clinical baselines, including board-certified neurologists, in retrospective diagnostic tasks. These results validate our LLM-driven approach as a scalable, robust solution for heterogeneous real-world data, offering a pathway to extend LLM-based reasoning to structured domains.
Problem

Research questions and friction points this paper is trying to address.

schema generalization
tabular data
semantic understanding
electronic health records
clinical reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Schema-Adaptive Learning
Tabular Representation
Large Language Models
Zero-Shot Transfer
Multimodal Clinical Reasoning