BLO-Inst: Bi-Level Optimization Based Alignment of YOLO and SAM for Robust Instance Segmentation

📅 2026-01-29

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

This work addresses the limitations of existing pipelines that combine YOLO and SAM for automated instance segmentation, which suffer from objective misalignment and overfitting during alignment. To overcome these issues, we propose the first framework that introduces bilevel optimization to align a detector with SAM: in the lower level, SAM is fine-tuned on a subset $ D_1 $ to enhance segmentation fidelity, while in the upper level, the YOLO detector is optimized on a separate subset $ D_2 $ to generate bounding box prompts that better facilitate SAM’s segmentation. This approach enables the detector to learn a generalizable prompting strategy oriented toward downstream segmentation quality, effectively mitigating overfitting. Experiments demonstrate that our framework significantly outperforms current baselines across both general and biomedical instance segmentation tasks, achieving more robust zero-shot segmentation performance.

Technology Category

Application Category

📝 Abstract

The Segment Anything Model has revolutionized image segmentation with its zero-shot capabilities, yet its reliance on manual prompts hinders fully automated deployment. While integrating object detectors as prompt generators offers a pathway to automation, existing pipelines suffer from two fundamental limitations: objective mismatch, where detectors optimized for geometric localization do not correspond to the optimal prompting context required by SAM, and alignment overfitting in standard joint training, where the detector simply memorizes specific prompt adjustments for training samples rather than learning a generalizable policy. To bridge this gap, we introduce BLO-Inst, a unified framework that aligns detection and segmentation objectives by bi-level optimization. We formulate the alignment as a nested optimization problem over disjoint data splits. In the lower level, the SAM is fine-tuned to maximize segmentation fidelity given the current detection proposals on a subset ($D_1$). In the upper level, the detector is updated to generate bounding boxes that explicitly minimize the validation loss of the fine-tuned SAM on a separate subset ($D_2$). This effectively transforms the detector into a segmentation-aware prompt generator, optimizing the bounding boxes not just for localization accuracy, but for downstream mask quality. Extensive experiments demonstrate that BLO-Inst achieves superior performance, outperforming standard baselines on tasks in general and biomedical domains.

Problem

Research questions and friction points this paper is trying to address.

objective mismatch

alignment overfitting

instance segmentation

prompt generation

Segment Anything Model

Innovation

Methods, ideas, or system contributions that make the work stand out.

bi-level optimization

instance segmentation

prompt alignment