🤖 AI Summary
Existing text detection methods often suffer from edge feature loss during mask shrinkage-expansion procedures and struggle to distinguish foreground from background, limiting their accuracy in detecting arbitrarily shaped text. This work proposes the Text-Pass Filter (TPF) framework, which introduces the concept of band-pass filtering into text detection for the first time. TPF constructs a dedicated feature-filter pair for each text instance to directly segment complete text regions, thereby eliminating the need for complex post-processing. The framework further incorporates a Reinforcement Ensemble Unit (REU) to enhance feature consistency for elongated text and a Foreground Prior Unit (FPU) to improve foreground-background discrimination, enabling natural separation of adjacent or touching text instances. Experiments demonstrate that TPF significantly improves detection accuracy and completeness while maintaining real-time inference speed.
📝 Abstract
To pursue an efficient text assembling process, existing methods detect texts via the shrink-mask expansion strategy. However, the shrinking operation loses the visual features of text margins and confuses the foreground and background difference, which brings intrinsic limitations to recognize text features. We follow this issue and design Text-Pass Filter (TPF) for arbitrary-shaped text detection. It segments the whole text directly, which avoids the intrinsic limitations. It is noteworthy that different from previous whole text region-based methods, TPF can separate adhesive texts naturally without complex decoding or post-processing processes, which makes it possible for real-time text detection. Concretely, we find that the band-pass filter allows through components in a specified band of frequencies, called its passband but blocks components with frequencies above or below this band. It provides a natural idea for extracting whole texts separately. By simulating the band-pass filter, TPF constructs a unique feature-filter pair for each text. In the inference stage, every filter extracts the corresponding matched text by passing its pass-feature and blocking other features. Meanwhile, considering the large aspect ratio problem of ribbon-like texts makes it hard to recognize texts wholly, a Reinforcement Ensemble Unit (REU) is designed to enhance the feature consistency of the same text and to enlarge the filter's recognition field to help recognize whole texts. Furthermore, a Foreground Prior Unit (FPU) is introduced to encourage TPF to discriminate the difference between the foreground and background, which improves the feature-filter pair quality. Experiments demonstrate the effectiveness of REU and FPU while showing the TPF's superiority.