Investigating Zero-Shot Diagnostic Pathology in Vision-Language Models with Efficient Prompt Design

📅 2025-04-30

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

Vision-language models (VLMs) exhibit high sensitivity to clinical data scale, task formulation, and prompt design in zero-shot pathological diagnosis. Method: This study systematically evaluates Quilt-Net, Quilt-LLaVA, and CONCH on gastrointestinal whole-slide images (WSIs) for zero-shot diagnosis, identifying anatomical precision as a critical determinant of diagnostic accuracy. We propose a structured, multi-dimensional prompt engineering framework integrating domain specificity, anatomical fidelity, and instruction constraints. Contribution/Results: Ablation studies demonstrate that missing anatomical context significantly degrades performance, whereas model complexity is not decisive; CONCH achieves the highest diagnostic accuracy under precise anatomical prompting. Our work establishes an interpretable, reusable prompt design paradigm for reliable clinical deployment of medical VLMs.

Technology Category

Application Category

📝 Abstract

Vision-language models (VLMs) have gained significant attention in computational pathology due to their multimodal learning capabilities that enhance big-data analytics of giga-pixel whole slide image (WSI). However, their sensitivity to large-scale clinical data, task formulations, and prompt design remains an open question, particularly in terms of diagnostic accuracy. In this paper, we present a systematic investigation and analysis of three state of the art VLMs for histopathology, namely Quilt-Net, Quilt-LLAVA, and CONCH, on an in-house digestive pathology dataset comprising 3,507 WSIs, each in giga-pixel form, across distinct tissue types. Through a structured ablative study on cancer invasiveness and dysplasia status, we develop a comprehensive prompt engineering framework that systematically varies domain specificity, anatomical precision, instructional framing, and output constraints. Our findings demonstrate that prompt engineering significantly impacts model performance, with the CONCH model achieving the highest accuracy when provided with precise anatomical references. Additionally, we identify the critical importance of anatomical context in histopathological image analysis, as performance consistently degraded when reducing anatomical precision. We also show that model complexity alone does not guarantee superior performance, as effective domain alignment and domain-specific training are critical. These results establish foundational guidelines for prompt engineering in computational pathology and highlight the potential of VLMs to enhance diagnostic accuracy when properly instructed with domain-appropriate prompts.

Problem

Research questions and friction points this paper is trying to address.

Evaluating VLMs' sensitivity to clinical data and prompts in pathology diagnostics

Developing prompt engineering framework for improved histopathology model performance

Assessing impact of anatomical precision on diagnostic accuracy in VLMs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Systematic evaluation of three advanced VLMs

Comprehensive prompt engineering framework developed

Anatomical context crucial for diagnostic accuracy

🔎 Similar Papers

Do LLMs Understand Visual Anomalies? Uncovering LLM's Capabilities in Zero-shot Anomaly Detection