GuiDINO: Rethinking Vision Foundation Model in Medical Image Segmentation

📅 2026-03-01

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

This work addresses the challenge of applying general-purpose vision foundation models to medical image segmentation, where domain shift often hinders direct deployment. To this end, the authors propose GuiDINO, a framework that leverages DINOv2/v3 as a visual guidance generator. Through a lightweight TokenBook mechanism, GuiDINO produces spatial guidance masks that modulate feature activation in diverse medical segmentation backbones—such as nnUNet—thereby injecting foundation model priors while preserving their inductive biases. Notably, the approach avoids full fine-tuning by integrating a guidance-supervision loss, a boundary-focused hinge loss, and an efficient LoRA-based adaptation strategy. Extensive experiments demonstrate that GuiDINO consistently achieves superior segmentation accuracy and boundary robustness across multiple medical datasets, outperforming conventional fine-tuning baselines.

Technology Category

Application Category

📝 Abstract

Foundation vision models are increasingly adopted in medical image analysis. Due to domain shift, these pretrained models misalign with medical image segmentation needs without being fully fine-tuned or lightly adapted. We introduce GuiDINO, a framework that repositions native foundation model to acting as a visual guidance generator for downstream segmentation. GuiDINO extracts visual feature representation from DINOv3 and converts them into a spatial guide mask via a lightweight TokenBook mechanism, which aggregates token-prototype similarities. This guide mask gates feature activations in multiple segmentation backbones, thereby injecting foundation-model priors while preserving the inductive biases and efficiency of medical dedicated architectures. Training relies on a guide supervision objective loss that aligns the guide mask to ground-truth regions, optionally augmented by a boundary-focused hinge loss to sharpen fine structures. GuiDINO also supports parameter-efficient adaptation through LoRA on the DINOv3 guide backbone. Across diverse medical datasets and nnUNet-style inference, GuiDINO consistently improves segmentation quality and boundary robustness, suggesting a practical alternative to fine-tuning and offering a new perspective on how foundation models can best serve medical vision. Code is available at https://github.com/Hi-FishU/GuiDINO

Problem

Research questions and friction points this paper is trying to address.

foundation vision models

domain shift

medical image segmentation

visual guidance

feature alignment

Innovation

Methods, ideas, or system contributions that make the work stand out.

foundation model

medical image segmentation

visual guidance