Improved Sub-Visible Particle Classification in Flow Imaging Microscopy via Generative AI-Based Image Synthesis

📅 2025-08-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of **data scarcity** and **severe class imbalance**—particularly the extremely limited availability of rare particle classes such as silicone oil droplets and air bubbles—in streaming image microscopy, this paper proposes a **diffusion-based high-fidelity image synthesis method** specifically designed to augment training data for underrepresented classes. The synthesized images are rigorously evaluated on a large-scale validation set comprising 500,000 protein particles, demonstrating visual quality and structural fidelity comparable to real samples. Critically, integration of these synthetic images significantly improves classification accuracy of multi-class deep neural networks—especially for rare classes—without introducing perceptible artifacts or degrading overall model performance. We publicly release both the trained model and inference interface to ensure reproducibility. To our knowledge, this work constitutes the first generative AI solution tailored for rare-class augmentation in pharmaceutical particulate characterization.

Technology Category

Application Category

📝 Abstract
Sub-visible particle analysis using flow imaging microscopy combined with deep learning has proven effective in identifying particle types, enabling the distinction of harmless components such as silicone oil from protein particles. However, the scarcity of available data and severe imbalance between particle types within datasets remain substantial hurdles when applying multi-class classifiers to such problems, often forcing researchers to rely on less effective methods. The aforementioned issue is particularly challenging for particle types that appear unintentionally and in lower numbers, such as silicone oil and air bubbles, as opposed to protein particles, where obtaining large numbers of images through controlled settings is comparatively straightforward. In this work, we develop a state-of-the-art diffusion model to address data imbalance by generating high-fidelity images that can augment training datasets, enabling the effective training of multi-class deep neural networks. We validate this approach by demonstrating that the generated samples closely resemble real particle images in terms of visual quality and structure. To assess the effectiveness of using diffusion-generated images in training datasets, we conduct large-scale experiments on a validation dataset comprising 500,000 protein particle images and demonstrate that this approach improves classification performance with no negligible downside. Finally, to promote open research and reproducibility, we publicly release both our diffusion models and the trained multi-class deep neural network classifiers, along with a straightforward interface for easy integration into future studies, at https://github.com/utkuozbulak/svp-generative-ai.
Problem

Research questions and friction points this paper is trying to address.

Address data scarcity and imbalance in particle classification
Generate synthetic particle images using diffusion models
Improve multi-class classifier performance with augmented datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generative AI synthesizes high-fidelity particle images
Diffusion model mitigates data imbalance for training
Public release of models enhances research reproducibility
🔎 Similar Papers
No similar papers found.