🤖 AI Summary
Whole-slide images (WSIs) pose significant challenges for preprocessing due to their ultra-high resolution and substantial staining/scanning variability, leading to fragmented, non-reproducible pipelines for tissue detection, tiling, stain normalization, and annotation parsing. To address this, we propose the first lightweight, open-source, unified WSI preprocessing framework built upon OpenSlide. It natively supports sliding-window loading, automatic tissue region identification, adaptive tiling, standardized stain normalization, and structured annotation parsing—while directly producing outputs compatible with mainstream pathology foundation models. Our framework substantially lowers the barrier to AI-ready data preparation, enhances reproducibility and efficiency of preprocessing, and accelerates AI-ready dataset generation by 3–5× in empirical evaluation. The implementation is publicly released to foster community adoption, integration, and extensibility.
📝 Abstract
The integration of artificial intelligence (AI) into pathology is advancing precision medicine by improving diagnosis, treatment planning, and patient outcomes. Digitised whole-slide images (WSIs) capture rich spatial and morphological information vital for understanding disease biology, yet their gigapixel scale and variability pose major challenges for standardisation and analysis. Robust preprocessing, covering tissue detection, tessellation, stain normalisation, and annotation parsing is critical but often limited by fragmented and inconsistent workflows. We present PySlyde, a lightweight, open-source Python toolkit built on OpenSlide to simplify and standardise WSI preprocessing. PySlyde provides an intuitive API for slide loading, annotation management, tissue detection, tiling, and feature extraction, compatible with modern pathology foundation models. By unifying these processes, it streamlines WSI preprocessing, enhances reproducibility, and accelerates the generation of AI-ready datasets, enabling researchers to focus on model development and downstream analysis.