Char-SAM: Turning Segment Anything Model into Scene Text Segmentation Annotator with Character-level Visual Prompts

📅 2024-12-27

📈 Citations: 0

✨ Influential: 0

career value

149K/year

🤖 AI Summary

Existing SAM-based models suffer from over-segmentation and under-segmentation in fine-grained Chinese character segmentation, resulting in oversized word-level bounding boxes and inaccurate character-level masks—severely hindering text annotation efficiency. To address this, we propose a training-free, character-level automatic annotation framework. Our approach introduces two novel modules: Character-level Bounding Box Refinement (CBR) and Glyph-guided Refinement (CGR), which jointly transform coarse word-level bounding boxes into high-precision character-level visual prompts by integrating glyph-aware semantic embeddings with a bbox-to-mask transfer mechanism. The entire method operates solely atop the pre-trained SAM architecture—introducing no additional parameters or training overhead. On TextSeg, our method significantly improves character-level segmentation accuracy. Moreover, it achieves zero-shot generation of high-quality text masks on real-world benchmarks including COCO-Text and MLT17, substantially reducing annotation cost for fine-grained text segmentation.

Technology Category

Application Category

📝 Abstract

The recent emergence of the Segment Anything Model (SAM) enables various domain-specific segmentation tasks to be tackled cost-effectively by using bounding boxes as prompts. However, in scene text segmentation, SAM can not achieve desirable performance. The word-level bounding box as prompts is too coarse for characters, while the character-level bounding box as prompts suffers from over-segmentation and under-segmentation issues. In this paper, we propose an automatic annotation pipeline named Char-SAM, that turns SAM into a low-cost segmentation annotator with a Character-level visual prompt. Specifically, leveraging some existing text detection datasets with word-level bounding box annotations, we first generate finer-grained character-level bounding box prompts using the Character Bounding-box Refinement CBR module. Next, we employ glyph information corresponding to text character categories as a new prompt in the Character Glyph Refinement (CGR) module to guide SAM in producing more accurate segmentation masks, addressing issues of over-segmentation and under-segmentation. These modules fully utilize the bbox-to-mask capability of SAM to generate high-quality text segmentation annotations automatically. Extensive experiments on TextSeg validate the effectiveness of Char-SAM. Its training-free nature also enables the generation of high-quality scene text segmentation datasets from real-world datasets like COCO-Text and MLT17.

Problem

Research questions and friction points this paper is trying to address.

Semantic Segmentation

Chinese Character Recognition

Bounding Box Optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Char-SAM

Character-level Bounding

Automatic Text Segmentation

🔎 Similar Papers

EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model