🤖 AI Summary
This work addresses the challenges posed by the high dimensionality, heterogeneity, and sparsity of electronic health records (EHRs), which hinder clinical prediction performance, and the limitation of existing large language model approaches that are often task-agnostic and fail to effectively incorporate predictive signals. To overcome these issues, the authors propose ReToP, a framework that jointly trains an EHR rewriter and a predictor in an end-to-end manner. ReToP introduces a clinically driven feature selection strategy to generate synthetic pseudo-labels and innovatively designs a Classifier Supervision Contribution (CSC) scoring mechanism to align the rewriting process with the prediction objective—without requiring ground-truth rewritten data. Evaluated on three clinical prediction tasks using the MIMIC-IV dataset, ReToP significantly outperforms strong baselines, demonstrating superior generalization, rewriting fidelity, and focus on task-relevant features.
📝 Abstract
Electronic Health Records (EHRs) provide crucial information for clinical decision-making. However, their high-dimensionality, heterogeneity, and sparsity make clinical prediction challenging. Large Language Models (LLMs) allowed progress towards addressing this challenge by leveraging parametric medical knowledge to enhance EHR data for clinical prediction tasks. Despite the significant achievements made so far, most of the existing approaches are fundamentally task-agnostic in the sense that they deploy LLMs as EHR encoders or EHR completion modules without fully integrating signals from the prediction tasks. This naturally hinders task performance accuracy. In this work, we propose Rewrite-To-Predict (ReToP), an LLM-based framework that addresses this limitation through an end-to-end training of an EHR rewriter and a clinical predictor. To cope with the lack of EHR rewrite training data, we generate synthetic pseudo-labels using clinical-driven feature selection strategies to create diverse patient rewrites for fine-tuning the EHR rewriter. ReToP aligns the rewriter with prediction objectives using a novel Classifier Supervised Contribution (CSC) score that enables the EHR rewriter to generate clinically relevant rewrites that directly enhance prediction. Our ReToP framework surpasses strong baseline models across three clinical tasks on MIMIC-IV. Moreover, the analysis of ReToP shows its generalizability to unseen datasets and tasks with minimal fine-tuning while preserving faithful rewrites and emphasizing task-relevant predictive features.