🤖 AI Summary
This study addresses the significant performance degradation of scene text recognition under low-light conditions, primarily caused by insufficient illumination and noise interference. It presents the first systematic investigation of this problem, introducing LSTR—the first large-scale low-light text dataset—and ESTR, a real-world nighttime evaluation benchmark. The authors propose a text-aware joint training framework that integrates OCR fine-tuning, LoRA adaptation, and a novel re-rendering-based low-light image enhancement module (RLLIE). Experimental results demonstrate that standalone image enhancement or OCR optimization yields limited gains, whereas the proposed joint training strategy substantially improves recognition accuracy in real low-light scenarios, establishing a comprehensive benchmark and an effective methodology for future research.
📝 Abstract
Accurate text recognition in low-light environments is essential for intelligent systems in applications ranging from autonomous vehicles to smart surveillance. However, challenges such as poor illumination and noise interference remain underexplored. To address this gap, we introduce LSTR, a large-scale Low-light Scene Text Recognition dataset comprising 11,273 low-light images generated from well-lit datasets (ICDAR2015, IIIT5K, and WordArt), along with ESTR, which includes 60 real nighttime street-scene images in English and Spanish for exclusive evaluation. We explore two solution strategies: (1) employing Optical Character Recognition (OCR) models with fine-tuning and LoRA-based fine-tuning and (2) a joint training strategy that integrates a low-light image enhancement (LLIE) module with an OCR model. In particular, we propose a novel re-render LLIE (RLLIE) module, which demonstrates improved performance on real-world data. Through extensive experimentation, we analyze various training strategies and address a key research question: \emph{How bright is bright enough for effective scene text recognition?} Our results indicate that standalone LLIE or OCR models perform inadequately under low-light conditions, highlighting the advantages of specialized, jointly trained text-centric approaches. Additionally, we provide a comprehensive benchmark to support future research in robust low-light scene text recognition. https://huggingface.co/datasets/lumimusta/Low-light_Scene_Text_Dataset.