PathDiff: Histopathology Image Synthesis with Unpaired Text and Mask Conditions

📅 2025-06-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing pathological image generation methods are constrained by the scarcity of text-mask paired annotations in public datasets, hindering joint modeling of semantic descriptions and spatial structures. Method: We propose the first bimodal disentangled diffusion model for unpaired text and mask conditioning. By leveraging contrastive learning and cross-modal alignment, our approach maps heterogeneous modalities into a unified conditional embedding space, enabling disentangled control over semantics and geometry—without requiring paired data. The model accepts fine-grained textual inputs (e.g., diagnostic reports) and morphological masks (e.g., cell/tissue regions) as independent, controllable conditions. Contribution/Results: Experiments demonstrate state-of-the-art performance in FID score, semantic fidelity, and mask overlap accuracy. Moreover, downstream tasks—including cell segmentation and classification—achieve significant performance gains, validating the effectiveness and generalizability of bimodal co-generation.

Technology Category

Application Category

📝 Abstract
Diffusion-based generative models have shown promise in synthesizing histopathology images to address data scarcity caused by privacy constraints. Diagnostic text reports provide high-level semantic descriptions, and masks offer fine-grained spatial structures essential for representing distinct morphological regions. However, public datasets lack paired text and mask data for the same histopathological images, limiting their joint use in image generation. This constraint restricts the ability to fully exploit the benefits of combining both modalities for enhanced control over semantics and spatial details. To overcome this, we propose PathDiff, a diffusion framework that effectively learns from unpaired mask-text data by integrating both modalities into a unified conditioning space. PathDiff allows precise control over structural and contextual features, generating high-quality, semantically accurate images. PathDiff also improves image fidelity, text-image alignment, and faithfulness, enhancing data augmentation for downstream tasks like nuclei segmentation and classification. Extensive experiments demonstrate its superiority over existing methods.
Problem

Research questions and friction points this paper is trying to address.

Synthesizing histopathology images with unpaired text and mask data
Overcoming data scarcity due to privacy constraints in medical imaging
Enhancing control over image semantics and spatial details
Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion framework integrates unpaired mask-text data
Unified conditioning space enhances semantic and spatial control
Improves image fidelity and text-image alignment
🔎 Similar Papers
No similar papers found.