🤖 AI Summary
This work addresses the challenge of applying general-purpose vision foundation models to medical image segmentation, where domain shift often hinders direct deployment. To this end, the authors propose GuiDINO, a framework that leverages DINOv2/v3 as a visual guidance generator. Through a lightweight TokenBook mechanism, GuiDINO produces spatial guidance masks that modulate feature activation in diverse medical segmentation backbones—such as nnUNet—thereby injecting foundation model priors while preserving their inductive biases. Notably, the approach avoids full fine-tuning by integrating a guidance-supervision loss, a boundary-focused hinge loss, and an efficient LoRA-based adaptation strategy. Extensive experiments demonstrate that GuiDINO consistently achieves superior segmentation accuracy and boundary robustness across multiple medical datasets, outperforming conventional fine-tuning baselines.
📝 Abstract
Foundation vision models are increasingly adopted in medical image analysis. Due to domain shift, these pretrained models misalign with medical image segmentation needs without being fully fine-tuned or lightly adapted. We introduce GuiDINO, a framework that repositions native foundation model to acting as a visual guidance generator for downstream segmentation. GuiDINO extracts visual feature representation from DINOv3 and converts them into a spatial guide mask via a lightweight TokenBook mechanism, which aggregates token-prototype similarities. This guide mask gates feature activations in multiple segmentation backbones, thereby injecting foundation-model priors while preserving the inductive biases and efficiency of medical dedicated architectures. Training relies on a guide supervision objective loss that aligns the guide mask to ground-truth regions, optionally augmented by a boundary-focused hinge loss to sharpen fine structures. GuiDINO also supports parameter-efficient adaptation through LoRA on the DINOv3 guide backbone. Across diverse medical datasets and nnUNet-style inference, GuiDINO consistently improves segmentation quality and boundary robustness, suggesting a practical alternative to fine-tuning and offering a new perspective on how foundation models can best serve medical vision. Code is available at https://github.com/Hi-FishU/GuiDINO