π€ AI Summary
When directly applied to nuclear instance segmentation, the general-purpose segmentation model SAM struggles to capture local structural details, and full-parameter fine-tuning incurs prohibitive computational costs. To address this, this work proposes a parameter-efficient fine-tuning framework that freezes the SAM backbone while integrating three key components: a multi-scale adaptive local-aware adapter, a hierarchical modulation fusion module, and a boundary-guided mask refinement mechanism. These components jointly enhance the modelβs ability to delineate nuclear boundaries with high fidelity. The proposed approach significantly improves segmentation accuracy, enables sharp boundary reconstruction, and drastically reduces both the number of trainable parameters and computational overhead.
π Abstract
Nuclei instance segmentation is critical in computational pathology for cancer diagnosis and prognosis. Recently, the Segment Anything Model has demonstrated exceptional performance in various segmentation tasks, leveraging its rich priors and powerful global context modeling capabilities derived from large-scale pre-training on natural images. However, directly applying SAM to the medical imaging domain faces significant limitations: it lacks sufficient perception of the local structural features that are crucial for nuclei segmentation, and full fine-tuning for downstream tasks requires substantial computational costs. To efficiently transfer SAM's robust prior knowledge to nuclei instance segmentation while supplementing its task-aware local perception, we propose a parameter-efficient fine-tuning framework, named Cooperative Fine-Grained Refinement of SAM, consisting of three core components: 1) a Multi-scale Adaptive Local-aware Adapter, which enables effective capability transfer by augmenting the frozen SAM backbone with minimal parameters and instilling a powerful perception of local structures through dynamically generated, multi-scale convolutional kernels; 2) a Hierarchical Modulated Fusion Module, which dynamically aggregates multi-level encoder features to preserve fine-grained spatial details; and 3) a Boundary-Guided Mask Refinement, which integrates multi-context boundary cues with semantic features through explicit supervision, producing a boundary-focused signal to refine initial mask predictions for sharper delineation. These three components work cooperatively to enhance local perception, preserve spatial details, and refine boundaries, enabling SAM to perform accurate nuclei instance segmentation directly.