Restore Text First, Enhance Image Later: Two-Stage Scene Text Image Super-Resolution with Glyph Structure Guidance

📅 2025-10-24

📈 Citations: 0

✨ Influential: 0

career value

164K/year

🤖 AI Summary

Existing generative super-resolution methods achieve strong performance on natural images but often distort character shapes when applied to text images, failing to simultaneously preserve text readability and visual fidelity. To address this, we propose TIGER, the first “text-first, then image” two-stage framework: Stage I employs a structure-aware network to reconstruct glyphs with high geometric accuracy; Stage II performs image super-resolution guided explicitly by the reconstructed glyphs. Our key innovation is establishing an explicit glyph-to-image guidance mechanism—breaking the longstanding trade-off between readability and perceptual quality. To support extreme-scale text super-resolution (×14.29), we introduce UltraZoom-ST, the first benchmark dataset specifically designed for scene text under severe degradation. Extensive experiments demonstrate that TIGER achieves state-of-the-art performance across multiple quantitative metrics, significantly improving character legibility and global visual consistency.

Technology Category

Application Category

📝 Abstract

Current generative super-resolution methods show strong performance on natural images but distort text, creating a fundamental trade-off between image quality and textual readability. To address this, we introduce extbf{TIGER} ( extbf{T}ext- extbf{I}mage extbf{G}uided sup extbf{E}r- extbf{R}esolution), a novel two-stage framework that breaks this trade-off through a extit{"text-first, image-later"} paradigm. extbf{TIGER} explicitly decouples glyph restoration from image enhancement: it first reconstructs precise text structures and then uses them to guide subsequent full-image super-resolution. This glyph-to-image guidance ensures both high fidelity and visual consistency. To support comprehensive training and evaluation, we also contribute the extbf{UltraZoom-ST} (UltraZoom-Scene Text), the first scene text dataset with extreme zoom ( extbf{$ imes$14.29}). Extensive experiments show that extbf{TIGER} achieves extbf{state-of-the-art} performance, enhancing readability while preserving overall image quality.

Problem

Research questions and friction points this paper is trying to address.

Restores distorted text in super-resolution images

Decouples glyph reconstruction from image enhancement

Breaks trade-off between image quality and readability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage framework decouples glyph restoration from enhancement

Glyph structure guidance ensures high fidelity and consistency

Novel dataset supports extreme zoom text super-resolution training

🔎 Similar Papers

No similar papers found.