🤖 AI Summary
This study addresses a critical gap in optical character recognition (OCR) evaluation for visually impaired users by conducting the first systematic assessment of four mainstream OCR engines—Google Vision, PaddleOCR 3.0, EasyOCR, and Tesseract—under real-world walking conditions. Evaluations were performed across devices (smartphones and smart glasses), camera lenses (main and ultra-wide), and capture parameters (distances of 1–7 meters and viewing angles of 0–75 degrees), with character-level accuracy measured as a function of walking speed and perspective distortion. Results demonstrate that recognition accuracy significantly degrades with increased walking speed and larger viewing angles. Google Vision consistently outperformed other engines, while PaddleOCR emerged as the best open-source alternative. Optimal performance was achieved using a smartphone’s main camera mounted in a shoulder-worn configuration, highlighting practical deployment strategies for assistive OCR systems in dynamic environments.
📝 Abstract
Optical character recognition (OCR), which converts printed or handwritten text into machine-readable form, is widely used in assistive technology for people with blindness and low vision. Yet, most evaluations rely on static datasets that do not reflect the challenges of mobile use. In this study, we systematically evaluated OCR performance under both static and dynamic conditions. Static tests measured detection range across distances of 1-7 meters and viewing angles of 0-75 degrees horizontally. Dynamic tests examined the impact of motion by varying walking speed from slow (0.8 m/s) to very fast (1.8 m/s) and comparing three camera mounting positions: head-mounted, shoulder-mounted, and hand-held. We evaluated both a smartphone and smart glasses, using the phone's main and ultra-wide cameras. Four OCR engines were benchmarked to assess accuracy at different distances and viewing angles: Google Vision, PaddleOCR 3.0, EasyOCR, and Tesseract. PaddleOCR 3.0 was then used to evaluate accuracy at different walking speeds. Accuracy was computed at the character level using the Levenshtein ratio against manually defined ground truth. Results showed that recognition accuracy declined with increased walking speed and wider viewing angles. Google Vision achieved the highest overall accuracy, with PaddleOCR close behind as the strongest open-source alternative. Across devices, the phone's main camera achieved the highest accuracy, and a shoulder-mounted placement yielded the highest average among body positions; however, differences among shoulder, head, and hand were not statistically significant.