A Multimodal Pipeline for Clinical Data Extraction: Applying Vision-Language Models to Scans of Transfusion Reaction Reports

📅 2025-04-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Paperwork-based medical forms and manual transcription lead to low efficiency and high error rates, undermining regulatory reporting accuracy. This study introduces the first open-source multimodal pipeline for end-to-end automated extraction and classification of checkbox data from scanned transfusion reaction reports. The pipeline integrates checkbox detection (YOLOv8), multilingual OCR (PaddleOCR), and a multilingual vision-language model (mPLUG-Owl2). Its key innovation lies in the first integration of a multilingual VLM into clinical form parsing—enabling zero-shot transfer to other checkbox-dense documents. Evaluated on gold-standard data spanning 2017–2024, the system achieves high precision and recall, substantially reducing administrative burden while ensuring regulatory compliance. The fully open-sourced implementation supports local deployment and multilingual adaptation.

Technology Category

Application Category

📝 Abstract
Despite the growing adoption of electronic health records, many processes still rely on paper documents, reflecting the heterogeneous real-world conditions in which healthcare is delivered. The manual transcription process is time-consuming and prone to errors when transferring paper-based data to digital formats. To streamline this workflow, this study presents an open-source pipeline that extracts and categorizes checkbox data from scanned documents. Demonstrated on transfusion reaction reports, the design supports adaptation to other checkbox-rich document types. The proposed method integrates checkbox detection, multilingual optical character recognition (OCR) and multilingual vision-language models (VLMs). The pipeline achieves high precision and recall compared against annually compiled gold-standards from 2017 to 2024. The result is a reduction in administrative workload and accurate regulatory reporting. The open-source availability of this pipeline encourages self-hosted parsing of checkbox forms.
Problem

Research questions and friction points this paper is trying to address.

Extracting data from paper-based clinical documents efficiently
Reducing errors in manual transcription of checkbox forms
Streamlining workflow with multimodal vision-language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates checkbox detection for form processing
Uses multilingual OCR and vision-language models
Open-source pipeline for self-hosted parsing
🔎 Similar Papers
No similar papers found.
H
Henning Schafer
Institute for Transfusion Medicine, University Hospital Essen, Hufelandstraße 55, Essen, Germany; Department of Computer Science, University of Applied Sciences and Arts Dortmund (FHDO), Emil-Figge Str. 42, Dortmund, Germany
C
C. S. Schmidt
Institute for Transfusion Medicine, University Hospital Essen, Hufelandstraße 55, Essen, Germany; Institute for AI in Medicine (IKIM), University Hospital Essen, Girardetstraße 2, Essen, Germany
J
Johannes Wutzkowsky
Department of Computer Science, University of Applied Sciences and Arts Dortmund (FHDO), Emil-Figge Str. 42, Dortmund, Germany
K
Kamil Lorek
Department of Computer Science, University of Applied Sciences and Arts Dortmund (FHDO), Emil-Figge Str. 42, Dortmund, Germany
L
Lea Reinartz
Department of Computer Science, University of Applied Sciences and Arts Dortmund (FHDO), Emil-Figge Str. 42, Dortmund, Germany
J
Johannes Ruckert
Department of Computer Science, University of Applied Sciences and Arts Dortmund (FHDO), Emil-Figge Str. 42, Dortmund, Germany
C
Christian Temme
Institute for Transfusion Medicine, University Hospital Essen, Hufelandstraße 55, Essen, Germany
B
Britta Bockmann
Department of Computer Science, University of Applied Sciences and Arts Dortmund (FHDO), Emil-Figge Str. 42, Dortmund, Germany
P
Peter A. Horn
Institute for Transfusion Medicine, University Hospital Essen, Hufelandstraße 55, Essen, Germany
Christoph M. Friedrich
Christoph M. Friedrich
Professor of Biomedical Computer Science, University of Applied Sciences and Arts, Dortmund
machine learningbiomedical applicationstext mining