ClinStructor: AI-Powered Structuring of Unstructured Clinical Texts

📅 2025-11-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Clinical notes contain rich, high-value contextual information but their unstructured nature leads to model susceptibility to gender/racial biases, poor generalization across electronic health record (EHR) systems, and limited interpretability. To address this, we propose a large language model (LLM)-based framework that automatically converts raw clinical text into task-oriented question-answer (QA) pairs—a structured, semantically preserved intermediate representation. Evaluated on ICU mortality prediction, our method incurs only a modest 2–3 percentage-point AUC drop while substantially improving model transparency, fairness, and cross-EHR generalizability. Our key contribution is the first use of LLM-driven QA-style structuring as a unified interface for enhancing interpretability and mitigating bias—simultaneously preserving predictive performance, enabling fine-grained controllability, and ensuring deployment robustness.

Technology Category

Application Category

📝 Abstract
Clinical notes contain valuable, context-rich information, but their unstructured format introduces several challenges, including unintended biases (e.g., gender or racial bias), and poor generalization across clinical settings (e.g., models trained on one EHR system may perform poorly on another due to format differences) and poor interpretability. To address these issues, we present ClinStructor, a pipeline that leverages large language models (LLMs) to convert clinical free-text into structured, task-specific question-answer pairs prior to predictive modeling. Our method substantially enhances transparency and controllability and only leads to a modest reduction in predictive performance (a 2-3% drop in AUC), compared to direct fine-tuning, on the ICU mortality prediction task. ClinStructor lays a strong foundation for building reliable, interpretable, and generalizable machine learning models in clinical environments.
Problem

Research questions and friction points this paper is trying to address.

Addressing unstructured clinical text challenges like bias and poor generalization
Converting clinical free-text into structured question-answer pairs using LLMs
Enhancing transparency and interpretability while maintaining predictive performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Converts clinical free-text into structured question-answer pairs
Leverages large language models for clinical text structuring
Enhances transparency and controllability in predictive modeling
🔎 Similar Papers
No similar papers found.
K
Karthikeyan K
Duke University
Raghuveer Thirukovalluru
Raghuveer Thirukovalluru
Duke University
Natural Language Processing
D
David Carlson
Duke University