A Multimodal Pipeline for Clinical Data Extraction: Applying Vision-Language Models to Scans of Transfusion Reaction Reports

📅 2025-04-28

📈 Citations: 0

✨ Influential: 0

career value

170K/year

🤖 AI Summary

Paperwork-based medical forms and manual transcription lead to low efficiency and high error rates, undermining regulatory reporting accuracy. This study introduces the first open-source multimodal pipeline for end-to-end automated extraction and classification of checkbox data from scanned transfusion reaction reports. The pipeline integrates checkbox detection (YOLOv8), multilingual OCR (PaddleOCR), and a multilingual vision-language model (mPLUG-Owl2). Its key innovation lies in the first integration of a multilingual VLM into clinical form parsing—enabling zero-shot transfer to other checkbox-dense documents. Evaluated on gold-standard data spanning 2017–2024, the system achieves high precision and recall, substantially reducing administrative burden while ensuring regulatory compliance. The fully open-sourced implementation supports local deployment and multilingual adaptation.

Technology Category

Application Category

📝 Abstract

Despite the growing adoption of electronic health records, many processes still rely on paper documents, reflecting the heterogeneous real-world conditions in which healthcare is delivered. The manual transcription process is time-consuming and prone to errors when transferring paper-based data to digital formats. To streamline this workflow, this study presents an open-source pipeline that extracts and categorizes checkbox data from scanned documents. Demonstrated on transfusion reaction reports, the design supports adaptation to other checkbox-rich document types. The proposed method integrates checkbox detection, multilingual optical character recognition (OCR) and multilingual vision-language models (VLMs). The pipeline achieves high precision and recall compared against annually compiled gold-standards from 2017 to 2024. The result is a reduction in administrative workload and accurate regulatory reporting. The open-source availability of this pipeline encourages self-hosted parsing of checkbox forms.

Problem

Research questions and friction points this paper is trying to address.

Extracting data from paper-based clinical documents efficiently

Reducing errors in manual transcription of checkbox forms

Streamlining workflow with multimodal vision-language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates checkbox detection for form processing

Uses multilingual OCR and vision-language models

Open-source pipeline for self-hosted parsing

🔎 Similar Papers

No similar papers found.