Beyond Pixel Simulation: Pathology Image Generation via Diagnostic Semantic Tokens and Prototype Control

📅 2025-12-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In computational pathology, generative models have long suffered from weak semantic controllability, scarcity of high-quality image-text data, and terminology heterogeneity. To address these challenges, we propose UniPath, a semantics-driven controllable generation framework featuring a novel multi-stream semantic control mechanism—integrating raw text, diagnostic semantic tokens, and morphological prototypes. We introduce diagnostic-robust semantic token extraction and attribute-bundle expansion to construct the first large-scale pathological image-text dataset (2.65M images, including a 68K finely annotated subset). Component-level fine-grained control is achieved via frozen pathology multimodal large language model distillation, learnable query mechanisms, and a prototype bank. A four-tier pathology-specific evaluation protocol is established. Experiments achieve state-of-the-art performance: Patho-FID of 80.9 (+51% improvement) and semantic control accuracy of 98.7%. Code, model weights, and the dataset are fully open-sourced.

Technology Category

Application Category

📝 Abstract
In computational pathology, understanding and generation have evolved along disparate paths: advanced understanding models already exhibit diagnostic-level competence, whereas generative models largely simulate pixels. Progress remains hindered by three coupled factors: the scarcity of large, high-quality image-text corpora; the lack of precise, fine-grained semantic control, which forces reliance on non-semantic cues; and terminological heterogeneity, where diverse phrasings for the same diagnostic concept impede reliable text conditioning. We introduce UniPath, a semantics-driven pathology image generation framework that leverages mature diagnostic understanding to enable controllable generation. UniPath implements Multi-Stream Control: a Raw-Text stream; a High-Level Semantics stream that uses learnable queries to a frozen pathology MLLM to distill paraphrase-robust Diagnostic Semantic Tokens and to expand prompts into diagnosis-aware attribute bundles; and a Prototype stream that affords component-level morphological control via a prototype bank. On the data front, we curate a 2.65M image-text corpus and a finely annotated, high-quality 68K subset to alleviate data scarcity. For a comprehensive assessment, we establish a four-tier evaluation hierarchy tailored to pathology. Extensive experiments demonstrate UniPath's SOTA performance, including a Patho-FID of 80.9 (51% better than the second-best) and fine-grained semantic control achieving 98.7% of the real-image. The meticulously curated datasets, complete source code, and pre-trained model weights developed in this study will be made openly accessible to the public.
Problem

Research questions and friction points this paper is trying to address.

Generates pathology images using diagnostic semantics
Addresses data scarcity with curated image-text corpus
Enables fine-grained semantic control via prototype bank
Innovation

Methods, ideas, or system contributions that make the work stand out.

Using diagnostic semantic tokens for robust text conditioning
Implementing multi-stream control with prototype-level morphological guidance
Curating large-scale pathology image-text corpus for data scarcity alleviation
🔎 Similar Papers
No similar papers found.
M
Minghao Han
College of Intelligent Robotics and Advanced Manufacturing, Fudan University
Y
YiChen Liu
School of Intelligent Science and Technology, University of Science and Technology Beijing
Yizhou Liu
Yizhou Liu
MIT
Dynamical systemsStatistical physicsPhysics of living systemsPhysics of AI
Zizhi Chen
Zizhi Chen
Fudan university
Pathology Images
Jingqun Tang
Jingqun Tang
ByteDance Inc.
Computer VisionDocument IntelligenceMLLMMultimodal Generative Models
X
Xuecheng Wu
School of Computer Science and Technology, Xi’an Jiaotong University
Dingkang Yang
Dingkang Yang
ByteDance
Multimodal LearningGenerative AIEmbodied AI
Lihua Zhang
Lihua Zhang
Wuhan University
computational biologybioinformaticsdata mining