🤖 AI Summary
This work addresses the longstanding challenge of jointly segmenting grain boundaries and lithological semantics in thin-section petrographic images, a task historically treated in isolation and hindered by color variations due to extinction effects, ultra-fine boundaries, and insufficient exploitation of multi-view information. To overcome these limitations, the authors propose Petro-SAM, a novel framework that adapts the Segment Anything Model (SAM) to this multi-task setting for the first time. Petro-SAM introduces a Merge Block to fuse images from seven polarized light perspectives and incorporates a color entropy prior alongside a multi-scale feature fusion mechanism, enabling prompt-driven joint optimization of boundaries and semantic labels. Evaluated on an expert-annotated dataset, the method substantially outperforms existing single-task and conventional approaches, achieving high-precision collaborative segmentation.
📝 Abstract
Grain-edge segmentation (GES) and lithology semantic segmentation (LSS) are two pivotal tasks for quantifying rock fabric and composition. However, these two tasks are often treated separately, and the segmentation quality is implausible albeit expensive, time-consuming, and expert-annotated datasets have been used. Recently, foundation models, especially the Segment Anything Model (SAM), have demonstrated impressive robustness for boundary alignment. However, directly adapting SAM to joint GES and LSS is nontrivial due to 1) severe domain gap induced by extinction-dependent color variations and ultra-fine grain boundaries, and 2) lacking novel modules for joint learning on multi-angle petrographic image stacks. In this paper, we propose Petro-SAM, a novel two-stage, multi-task framework that can achieve high-quality joint GES and LSS on petrographic images. Specifically, based on SAM, we introduce a Merge Block to integrate seven polarized views, effectively solving the extinction issue. Moreover, we introduce multi-scale feature fusion and color-entropy priors to refine the detection.