TiCLS : Tightly Coupled Language Text Spotter

📅 2026-02-03

📈 Citations: 0

✨ Influential: 0

career value

148K/year

🤖 AI Summary

This work addresses the limitations of existing scene text recognition methods in handling short, fragmented, or blurry text and their underutilization of external linguistic knowledge. To overcome these challenges, the authors propose TiCLS, an end-to-end recognition model that tightly integrates a character-level pre-trained language model with visual features through a dedicated language decoder, which explicitly aligns and fuses visual and linguistic information. By leveraging pre-trained language models to guide multimodal fusion, TiCLS significantly enhances robustness in recognizing low-quality text. Extensive experiments on the ICDAR 2015 and Total-Text benchmarks demonstrate state-of-the-art performance, validating the effectiveness and superiority of explicitly incorporating linguistic knowledge into the recognition pipeline.

Technology Category

Application Category

📝 Abstract

Scene text spotting aims to detect and recognize text in real-world images, where instances are often short, fragmented, or visually ambiguous. Existing methods primarily rely on visual cues and implicitly capture local character dependencies, but they overlook the benefits of external linguistic knowledge. Prior attempts to integrate language models either adapt language modeling objectives without external knowledge or apply pretrained models that are misaligned with the word-level granularity of scene text. We propose TiCLS, an end-to-end text spotter that explicitly incorporates external linguistic knowledge from a character-level pretrained language model. TiCLS introduces a linguistic decoder that fuses visual and linguistic features, yet can be initialized by a pretrained language model, enabling robust recognition of ambiguous or fragmented text. Experiments on ICDAR 2015 and Total-Text demonstrate that TiCLS achieves state-of-the-art performance, validating the effectiveness of PLM-guided linguistic integration for scene text spotting.

Problem

Research questions and friction points this paper is trying to address.

scene text spotting

linguistic knowledge

language model

text recognition

visual ambiguity

Innovation

Methods, ideas, or system contributions that make the work stand out.

scene text spotting

pretrained language model

linguistic integration