🤖 AI Summary
Existing SAM-based models suffer from over-segmentation and under-segmentation in fine-grained Chinese character segmentation, resulting in oversized word-level bounding boxes and inaccurate character-level masks—severely hindering text annotation efficiency. To address this, we propose a training-free, character-level automatic annotation framework. Our approach introduces two novel modules: Character-level Bounding Box Refinement (CBR) and Glyph-guided Refinement (CGR), which jointly transform coarse word-level bounding boxes into high-precision character-level visual prompts by integrating glyph-aware semantic embeddings with a bbox-to-mask transfer mechanism. The entire method operates solely atop the pre-trained SAM architecture—introducing no additional parameters or training overhead. On TextSeg, our method significantly improves character-level segmentation accuracy. Moreover, it achieves zero-shot generation of high-quality text masks on real-world benchmarks including COCO-Text and MLT17, substantially reducing annotation cost for fine-grained text segmentation.
📝 Abstract
The recent emergence of the Segment Anything Model (SAM) enables various domain-specific segmentation tasks to be tackled cost-effectively by using bounding boxes as prompts. However, in scene text segmentation, SAM can not achieve desirable performance. The word-level bounding box as prompts is too coarse for characters, while the character-level bounding box as prompts suffers from over-segmentation and under-segmentation issues. In this paper, we propose an automatic annotation pipeline named Char-SAM, that turns SAM into a low-cost segmentation annotator with a Character-level visual prompt. Specifically, leveraging some existing text detection datasets with word-level bounding box annotations, we first generate finer-grained character-level bounding box prompts using the Character Bounding-box Refinement CBR module. Next, we employ glyph information corresponding to text character categories as a new prompt in the Character Glyph Refinement (CGR) module to guide SAM in producing more accurate segmentation masks, addressing issues of over-segmentation and under-segmentation. These modules fully utilize the bbox-to-mask capability of SAM to generate high-quality text segmentation annotations automatically. Extensive experiments on TextSeg validate the effectiveness of Char-SAM. Its training-free nature also enables the generation of high-quality scene text segmentation datasets from real-world datasets like COCO-Text and MLT17.