🤖 AI Summary
To address storage redundancy, high transmission bandwidth, and excessive computational resource consumption in large-scale medical imaging for deep learning, this paper introduces the Medical Imaging Streaming Toolkit (MIST). MIST proposes a novel format-agnostic streaming database architecture that dynamically generates on-demand multi-resolution, multi-format (e.g., DICOM, NIfTI, PNG) image streams from a single high-fidelity source—eliminating the need for pre-stored redundant copies. Its streaming access framework leverages metadata indexing and incremental decoding to enable low-overhead, lossless-quality, real-time adaptation. Evaluated across eight large, cross-modality and cross-anatomy datasets, MIST achieves significant reductions in storage footprint and download bandwidth compared to conventional approaches, while preserving pixel-level fidelity. This work establishes a lightweight, efficient infrastructure foundation for scalable medical imaging AI research.
📝 Abstract
Large-scale medical imaging datasets have accelerated deep learning (DL) for medical image analysis. However, the large scale of these datasets poses a challenge for researchers, resulting in increased storage and bandwidth requirements for hosting and accessing them. Since different researchers have different use cases and require different resolutions or formats for DL, it is neither feasible to anticipate every researcher's needs nor practical to store data in multiple resolutions and formats. To that end, we propose the Medical Image Streaming Toolkit (MIST), a format-agnostic database that enables streaming of medical images at different resolutions and formats from a single high-resolution copy. We evaluated MIST across eight popular, large-scale medical imaging datasets spanning different body parts, modalities, and formats. Our results showed that our framework reduced the storage and bandwidth requirements for hosting and downloading datasets without impacting image quality. We demonstrate that MIST addresses the challenges posed by large-scale medical imaging datasets by building a data-efficient and format-agnostic database to meet the diverse needs of researchers and reduce barriers to DL research in medical imaging.