š¤ AI Summary
To address the challenge of adapting large language models (LLMs) to electronic health record (EHR) analysis under stringent privacy constraints and limited computational resources, this paper proposes a lightweight, on-device LLM framework. Methodologically, it introduces a novel regular-expression-based pre-filtering mechanism synergized with retrieval-augmented generation (RAG) to suppress noise in lengthy, unstructured EHR texts. Integrated with zero-/few-shot learning, model compression, and GPU-free deployment, the framework ensures end-to-end privacy preservation and efficient inference. Evaluated on MIMIC-IV and other clinical datasets, it boosts accuracy by 23.5% on tasks including diagnosis extraction and critical biomarker identificationāoutperforming comparably sized fine-tuned modelsāand enables real-time inference on CPU-only servers. Key contributions include: (1) the first EHR-specific lightweight on-device deployment paradigm, and (2) a privacy-aware, computationally efficient preprocessingāretrievalāgeneration co-design architecture that jointly optimizes privacy, latency, and task performance.
š Abstract
Large Language Models (LLMs) have demonstrated remarkable proficiency in natural language processing; however, their application in sensitive domains such as healthcare, especially in processing Electronic Health Records (EHRs), is constrained by limited computational resources and privacy concerns. This paper introduces a compact LLM framework optimized for local deployment in environments with stringent privacy requirements and restricted access to high-performance GPUs. Our approach leverages simple yet powerful preprocessing techniques, including regular expressions (regex) and Retrieval-Augmented Generation (RAG), to extract and highlight critical information from clinical notes. By pre-filtering long, unstructured text, we enhance the performance of smaller LLMs on EHR-related tasks. Our framework is evaluated using zero-shot and few-shot learning paradigms on both private and publicly available datasets (MIMIC-IV), with additional comparisons against fine-tuned LLMs on MIMIC-IV. Experimental results demonstrate that our preprocessing strategy significantly supercharges the performance of smaller LLMs, making them well-suited for privacy-sensitive and resource-constrained applications. This study offers valuable insights into optimizing LLM performance for local, secure, and efficient healthcare applications. It provides practical guidance for real-world deployment for LLMs while tackling challenges related to privacy, computational feasibility, and clinical applicability.