Measuring and Aligning Abstraction in Vision-Language Models with Medical Taxonomies

📅 2026-01-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
While current vision-language models demonstrate strong performance in chest X-ray classification, their reliance on flat evaluation metrics fails to distinguish between errors with vastly different clinical consequences, often leading to severe abstraction-level misalignments. To address this, this work introduces the concept of “catastrophic abstraction errors” and proposes a hierarchical evaluation framework grounded in medical ontologies. Furthermore, it designs a classification-aware fine-tuning strategy incorporating risk-constrained thresholds and radial embeddings to align model representations with the structural hierarchy of medical knowledge. Experimental results show that the proposed approach reduces the rate of catastrophic abstraction errors to below 2% while maintaining excellent overall performance, thereby significantly enhancing the clinical safety and deployment reliability of the model.

Technology Category

Application Category

📝 Abstract
Vision-Language Models show strong zero-shot performance for chest X-ray classification, but standard flat metrics fail to distinguish between clinically minor and severe errors. This work investigates how to quantify and mitigate abstraction errors by leveraging medical taxonomies. We benchmark several state-of-the-art VLMs using hierarchical metrics and introduce Catastrophic Abstraction Errors to capture cross-branch mistakes. Our results reveal substantial misalignment of VLMs with clinical taxonomies despite high flat performance. To address this, we propose risk-constrained thresholding and taxonomy-aware fine-tuning with radial embeddings, which reduce severe abstraction errors to below 2 per cent while maintaining competitive performance. These findings highlight the importance of hierarchical evaluation and representation-level alignment for safer and more clinically meaningful deployment of VLMs.
Problem

Research questions and friction points this paper is trying to address.

Vision-Language Models
Medical Taxonomies
Abstraction Errors
Hierarchical Evaluation
Chest X-ray Classification
Innovation

Methods, ideas, or system contributions that make the work stand out.

abstraction alignment
medical taxonomies
vision-language models
hierarchical evaluation
radial embeddings
🔎 Similar Papers
No similar papers found.
B
Ben Schaper
School of Computation, Information and Technology, Technical University of Munich, Germany
M
Maxime Di Folco
School of Computation, Information and Technology, Technical University of Munich, Germany; Institute of Machine Learning in Biomedical Imaging, Helmholtz Munich, Germany; LTCI, Télécom Paris, Institut Polytechnique de Paris, France
Bernhard Kainz
Bernhard Kainz
FAU Erlangen-Nürnberg, Imperial College London
human-in-the-loop computingmachine learningmedical image analysis
J
J. A. Schnabel
School of Computation, Information and Technology, Technical University of Munich, Germany; Institute of Machine Learning in Biomedical Imaging, Helmholtz Munich, Germany; Munich Center for Machine Learning (MCML); School of Biomedical Engineering and Imaging Sciences, King’s College London, UK
Cosmin I. Bercea
Cosmin I. Bercea
Technical University of Munich
Computer VisionMultimodal LearningGenerative AIAnomaly DetectionMedical Image Analysis