Universal Boosts, Specific Suppressors: Sparse Autoencoder Steering of Medical Vision-Language Models

📅 2026-05-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Medical vision-language models often generate hallucinated chest X-ray reports, manifesting as fabricated, omitted, or mislocalized findings. This work proposes a fine-tuning-free, inference-stage intervention that applies token-level residual modulation to late-layer features using sparse autoencoders. The method leverages a universal “enhancement” direction alongside model-specific “suppression” directions to provide precise guidance during report generation. Analysis reveals that enhancement directions are consistent across models, whereas suppression directions require customization, informing a cross-model transfer strategy. Evaluated on MIMIC-CXR, the approach improves clinical composite scores by 5.4%, 7.2%, and 17.0% for RadVLM, LLaVA-Rad, and CheXOne, respectively. It also achieves a +7.7% gain in GREEN score under zero-shot transfer to IU-Xray.
📝 Abstract
Medical vision-language models (VLMs) often hallucinate findings when generating chest X-ray reports: they fabricate findings that are not present in the image, miss important ones, or locate them incorrectly. We mitigate this without weight updates by decoding-time residual steering on a per-token sparse autoencoder (SAE) basis: Top-$K$ SAEs on late layers, causal steering against clinical errors, then combined suppress/boost intervention at inference time. On the MIMIC-CXR test split, our inference-only method improves the quality of generated reports for three radiology VLMs (RadVLM, LLaVA-Rad, and CheXOne), with relative improvements of +5.4%, +7.2%, and +17.0% in the clinical composite metric, and statistically significant GREEN gains on all backbones. A cross-model feature alignment shows that the quality-promoting (boost) directions overlap strongly across architectures, whereas hallucination-linked (suppress) directions are model-specific. Therefore, transferable steering must treat suppression per-backbone, rather than sharing a universal suppress list. The same recipe transfers zero-shot to IU-Xray (Green $+7.7\%$ rel.) without retraining, confirming that the identified features are properties of the model, not of the training corpus. We release causal feature sets and an interactive feature dashboard: https://cxr-sparse-feature-dashboard.netlify.app/.
Problem

Research questions and friction points this paper is trying to address.

medical vision-language models
hallucination
chest X-ray report generation
clinical errors
model reliability
Innovation

Methods, ideas, or system contributions that make the work stand out.

sparse autoencoder
residual steering
medical vision-language models
hallucination mitigation
inference-time intervention
🔎 Similar Papers
No similar papers found.
F
Farhad Nooralahzadeh
University of Zurich and University Hospital of Zurich, Switzerland; Zurich University of Applied Sciences, Switzerland
B
Benjamin Gundersen
University of Zurich and University Hospital of Zurich, Switzerland
N
Nicolas Deperrois
University of Zurich and University Hospital of Zurich, Switzerland
Hidetoshi Matsuo
Hidetoshi Matsuo
Kobe University
医用画像 深層学習
Mizuho Nishio
Mizuho Nishio
Kyoto University
Medical image analysisMachine learningDeep LearningRadiologyComputer Vision
Thomas Frauenfelder
Thomas Frauenfelder
University Hospital Zurich
Radiologie
Ahmed Allam
Ahmed Allam
Yale University School of Medicine
machine learning in healthcare
C
Christian Blüthgen
Stanford University, USA
Michael Moor
Michael Moor
MD, PhD. Assistant Professor at ETH Zurich. Previously: Stanford, Computer Science.
Medical AIFoundation modelsLLMsAgentsReasoning
Michael Krauthammer
Michael Krauthammer
University of Zurich
Biomedical Informatics