🤖 AI Summary
Existing generative super-resolution methods achieve strong performance on natural images but often distort character shapes when applied to text images, failing to simultaneously preserve text readability and visual fidelity. To address this, we propose TIGER, the first “text-first, then image” two-stage framework: Stage I employs a structure-aware network to reconstruct glyphs with high geometric accuracy; Stage II performs image super-resolution guided explicitly by the reconstructed glyphs. Our key innovation is establishing an explicit glyph-to-image guidance mechanism—breaking the longstanding trade-off between readability and perceptual quality. To support extreme-scale text super-resolution (×14.29), we introduce UltraZoom-ST, the first benchmark dataset specifically designed for scene text under severe degradation. Extensive experiments demonstrate that TIGER achieves state-of-the-art performance across multiple quantitative metrics, significantly improving character legibility and global visual consistency.
📝 Abstract
Current generative super-resolution methods show strong performance on natural images but distort text, creating a fundamental trade-off between image quality and textual readability. To address this, we introduce extbf{TIGER} ( extbf{T}ext- extbf{I}mage extbf{G}uided sup extbf{E}r- extbf{R}esolution), a novel two-stage framework that breaks this trade-off through a extit{"text-first, image-later"} paradigm. extbf{TIGER} explicitly decouples glyph restoration from image enhancement: it first reconstructs precise text structures and then uses them to guide subsequent full-image super-resolution. This glyph-to-image guidance ensures both high fidelity and visual consistency. To support comprehensive training and evaluation, we also contribute the extbf{UltraZoom-ST} (UltraZoom-Scene Text), the first scene text dataset with extreme zoom ( extbf{$ imes$14.29}). Extensive experiments show that extbf{TIGER} achieves extbf{state-of-the-art} performance, enhancing readability while preserving overall image quality.