Enhancing LLMs with Smart Preprocessing for EHR Analysis

📅 2024-12-03

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address the challenge of adapting large language models (LLMs) to electronic health record (EHR) analysis under stringent privacy constraints and limited computational resources, this paper proposes a lightweight, on-device LLM framework. Methodologically, it introduces a novel regular-expression-based pre-filtering mechanism synergized with retrieval-augmented generation (RAG) to suppress noise in lengthy, unstructured EHR texts. Integrated with zero-/few-shot learning, model compression, and GPU-free deployment, the framework ensures end-to-end privacy preservation and efficient inference. Evaluated on MIMIC-IV and other clinical datasets, it boosts accuracy by 23.5% on tasks including diagnosis extraction and critical biomarker identification—outperforming comparably sized fine-tuned models—and enables real-time inference on CPU-only servers. Key contributions include: (1) the first EHR-specific lightweight on-device deployment paradigm, and (2) a privacy-aware, computationally efficient preprocessing–retrieval–generation co-design architecture that jointly optimizes privacy, latency, and task performance.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have demonstrated remarkable proficiency in natural language processing; however, their application in sensitive domains such as healthcare, especially in processing Electronic Health Records (EHRs), is constrained by limited computational resources and privacy concerns. This paper introduces a compact LLM framework optimized for local deployment in environments with stringent privacy requirements and restricted access to high-performance GPUs. Our approach leverages simple yet powerful preprocessing techniques, including regular expressions (regex) and Retrieval-Augmented Generation (RAG), to extract and highlight critical information from clinical notes. By pre-filtering long, unstructured text, we enhance the performance of smaller LLMs on EHR-related tasks. Our framework is evaluated using zero-shot and few-shot learning paradigms on both private and publicly available datasets (MIMIC-IV), with additional comparisons against fine-tuned LLMs on MIMIC-IV. Experimental results demonstrate that our preprocessing strategy significantly supercharges the performance of smaller LLMs, making them well-suited for privacy-sensitive and resource-constrained applications. This study offers valuable insights into optimizing LLM performance for local, secure, and efficient healthcare applications. It provides practical guidance for real-world deployment for LLMs while tackling challenges related to privacy, computational feasibility, and clinical applicability.

Problem

Research questions and friction points this paper is trying to address.

Optimizing LLMs for EHR analysis with limited computational resources

Enhancing privacy-sensitive healthcare applications using compact LLMs

Improving clinical note processing via smart preprocessing techniques

Innovation

Methods, ideas, or system contributions that make the work stand out.

Compact LLM framework for local deployment

Regex and RAG for preprocessing EHRs

Zero-shot and few-shot learning evaluation

🔎 Similar Papers

No similar papers found.

Authors to Follow