TreeText-CTS: Compact, Source-Traceable Tree-Path Evidence for Irregular Clinical Time-Series Prediction

📅 2026-05-19

📈 Citations: 0

✨ Influential: 0

career value

155K/year

🤖 AI Summary

This work addresses the limited interpretability and lack of traceable prediction rationale in existing clinical time series models. It proposes an interpretable framework that operates without patient-level summaries or autoregressive decoding by transforming irregular electronic health records into compact, traceable tree-path evidence units. The approach leverages a frozen XGBoost model to generate multi-scale window summaries, which are then converted into threshold-conditioned textual statements. A language model encoder integrates a curated subset of these evidence units—selected via an evidence selection mechanism—to produce predictions. Evaluated on PhysioNet 2012, MIMIC-III, and PhysioNet 2019 benchmarks, the method achieves AUPRC improvements of 6.0–9.7 percentage points over current text-interface approaches, matching the performance of end-to-end numerical models while enabling deterministic construction of prediction evidence and precise tracing back to original data sources.

📝 Abstract

Numerical time-series models can effectively process irregular electronic health record (EHR) trajectories, but they do not naturally expose the measurements and temporal patterns supporting each risk estimate as readable evidence. Existing text-based interfaces improve readability, but typically rely on either raw serialization, which is lengthy and redundant, or patient-level free-form summaries, which are difficult to trace to source measurements and time windows. To bridge this gap, we introduce TreeText-CTS (Clinical Time-Series), which converts irregular EHR trajectories into human-readable, compact, source-traceable tree-path evidence units without patient-level summarization or inference-time autoregressive decoding. TreeText-CTS routes multi-scale window summaries through frozen XGBoost models and verbalizes activated tree paths as deterministic, source-traceable evidence units composed of threshold conditions. An evidence selector assembles an informative subset of these units, which a language-model encoder then integrates for prediction. Across PhysioNet 2012 mortality, MIMIC-III mortality, and PhysioNet 2019 sepsis-onset forecasting, TreeText-CTS achieves the best AUROC and AUPRC among evaluated text-based EHR time-series interfaces, improving AUPRC by 6.0 to 9.7 absolute percentage points over the strongest prior text-based interface while remaining competitive with numerical time-series models. Ablations show that tree-path evidence construction, evidence selection, and language-model composition each contribute to performance. Because every span passed to the language-model encoder is constructed from activated tree-path threshold conditions, TreeText-CTS makes the evidence supplied to the final predictor inspectable and source-traceable.

Problem

Research questions and friction points this paper is trying to address.

irregular clinical time-series

source-traceable evidence

readable prediction

EHR trajectories

tree-path evidence

Innovation

Methods, ideas, or system contributions that make the work stand out.

tree-path evidence

source-traceable

irregular clinical time-series