Beyond Pixel Simulation: Pathology Image Generation via Diagnostic Semantic Tokens and Prototype Control

📅 2025-12-24

📈 Citations: 0

✨ Influential: 0

career value

159K/year

🤖 AI Summary

In computational pathology, generative models have long suffered from weak semantic controllability, scarcity of high-quality image-text data, and terminology heterogeneity. To address these challenges, we propose UniPath, a semantics-driven controllable generation framework featuring a novel multi-stream semantic control mechanism—integrating raw text, diagnostic semantic tokens, and morphological prototypes. We introduce diagnostic-robust semantic token extraction and attribute-bundle expansion to construct the first large-scale pathological image-text dataset (2.65M images, including a 68K finely annotated subset). Component-level fine-grained control is achieved via frozen pathology multimodal large language model distillation, learnable query mechanisms, and a prototype bank. A four-tier pathology-specific evaluation protocol is established. Experiments achieve state-of-the-art performance: Patho-FID of 80.9 (+51% improvement) and semantic control accuracy of 98.7%. Code, model weights, and the dataset are fully open-sourced.

Technology Category

Application Category

📝 Abstract

In computational pathology, understanding and generation have evolved along disparate paths: advanced understanding models already exhibit diagnostic-level competence, whereas generative models largely simulate pixels. Progress remains hindered by three coupled factors: the scarcity of large, high-quality image-text corpora; the lack of precise, fine-grained semantic control, which forces reliance on non-semantic cues; and terminological heterogeneity, where diverse phrasings for the same diagnostic concept impede reliable text conditioning. We introduce UniPath, a semantics-driven pathology image generation framework that leverages mature diagnostic understanding to enable controllable generation. UniPath implements Multi-Stream Control: a Raw-Text stream; a High-Level Semantics stream that uses learnable queries to a frozen pathology MLLM to distill paraphrase-robust Diagnostic Semantic Tokens and to expand prompts into diagnosis-aware attribute bundles; and a Prototype stream that affords component-level morphological control via a prototype bank. On the data front, we curate a 2.65M image-text corpus and a finely annotated, high-quality 68K subset to alleviate data scarcity. For a comprehensive assessment, we establish a four-tier evaluation hierarchy tailored to pathology. Extensive experiments demonstrate UniPath's SOTA performance, including a Patho-FID of 80.9 (51% better than the second-best) and fine-grained semantic control achieving 98.7% of the real-image. The meticulously curated datasets, complete source code, and pre-trained model weights developed in this study will be made openly accessible to the public.

Problem

Research questions and friction points this paper is trying to address.

Generates pathology images using diagnostic semantics

Addresses data scarcity with curated image-text corpus

Enables fine-grained semantic control via prototype bank

Innovation

Methods, ideas, or system contributions that make the work stand out.

Using diagnostic semantic tokens for robust text conditioning

Implementing multi-stream control with prototype-level morphological guidance

Curating large-scale pathology image-text corpus for data scarcity alleviation

🔎 Similar Papers

No similar papers found.