Universal Boosts, Specific Suppressors: Sparse Autoencoder Steering of Medical Vision-Language Models

📅 2026-05-24

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

Medical vision-language models often generate hallucinated chest X-ray reports, manifesting as fabricated, omitted, or mislocalized findings. This work proposes a fine-tuning-free, inference-stage intervention that applies token-level residual modulation to late-layer features using sparse autoencoders. The method leverages a universal “enhancement” direction alongside model-specific “suppression” directions to provide precise guidance during report generation. Analysis reveals that enhancement directions are consistent across models, whereas suppression directions require customization, informing a cross-model transfer strategy. Evaluated on MIMIC-CXR, the approach improves clinical composite scores by 5.4%, 7.2%, and 17.0% for RadVLM, LLaVA-Rad, and CheXOne, respectively. It also achieves a +7.7% gain in GREEN score under zero-shot transfer to IU-Xray.

📝 Abstract

Medical vision-language models (VLMs) often hallucinate findings when generating chest X-ray reports: they fabricate findings that are not present in the image, miss important ones, or locate them incorrectly. We mitigate this without weight updates by decoding-time residual steering on a per-token sparse autoencoder (SAE) basis: Top-$K$ SAEs on late layers, causal steering against clinical errors, then combined suppress/boost intervention at inference time. On the MIMIC-CXR test split, our inference-only method improves the quality of generated reports for three radiology VLMs (RadVLM, LLaVA-Rad, and CheXOne), with relative improvements of +5.4%, +7.2%, and +17.0% in the clinical composite metric, and statistically significant GREEN gains on all backbones. A cross-model feature alignment shows that the quality-promoting (boost) directions overlap strongly across architectures, whereas hallucination-linked (suppress) directions are model-specific. Therefore, transferable steering must treat suppression per-backbone, rather than sharing a universal suppress list. The same recipe transfers zero-shot to IU-Xray (Green $+7.7\%$ rel.) without retraining, confirming that the identified features are properties of the model, not of the training corpus. We release causal feature sets and an interactive feature dashboard: https://cxr-sparse-feature-dashboard.netlify.app/.

Problem

Research questions and friction points this paper is trying to address.

medical vision-language models

hallucination

chest X-ray report generation

clinical errors

model reliability

Innovation

Methods, ideas, or system contributions that make the work stand out.

sparse autoencoder

residual steering

medical vision-language models

hallucination mitigation

inference-time intervention

🔎 Similar Papers

No similar papers found.

Bosch Group

bangalore, IN

Research Engineer - Perception and Machine Learning