🤖 AI Summary
This study addresses the generalization bottleneck in dementia diagnosis arising from insufficient semantic understanding of tabular data across heterogeneous electronic health record (EHR) schemas. To overcome this challenge, the authors propose a large language model (LLM)-based zero-shot tabular representation method that automatically converts structured clinical variables into natural language descriptions, generating transferable table embeddings. These embeddings are then fused with MRI data to construct a multimodal diagnostic framework. Notably, this approach achieves zero-shot alignment across disparate EHR schemas without manual feature engineering or model retraining. Evaluated on the NACC and ADNI datasets, the method significantly outperforms clinical baselines—including board-certified neurologists—demonstrating the potential of LLMs to enhance structured clinical reasoning in real-world, heterogeneous healthcare settings.
📝 Abstract
Machine learning for tabular data remains constrained by poor schema generalization, a challenge rooted in the lack of semantic understanding of structured variables. This challenge is particularly acute in domains like clinical medicine, where electronic health record (EHR) schemas vary significantly. To solve this problem, we propose Schema-Adaptive Tabular Representation Learning, a novel method that leverages large language models (LLMs) to create transferable tabular embeddings. By transforming structured variables into semantic natural language statements and encoding them with a pretrained LLM, our approach enables zero-shot alignment across unseen schemas without manual feature engineering or retraining. We integrate our encoder into a multimodal framework for dementia diagnosis, combining tabular and MRI data. Experiments on NACC and ADNI datasets demonstrate state-of-the-art performance and successful zero-shot transfer to unseen schemas, significantly outperforming clinical baselines, including board-certified neurologists, in retrospective diagnostic tasks. These results validate our LLM-driven approach as a scalable, robust solution for heterogeneous real-world data, offering a pathway to extend LLM-based reasoning to structured domains.