Reading in the Dark with Foveated Event Vision

📅 2025-06-07

📈 Citations: 0

✨ Influential: 0

career value

167K/year

🤖 AI Summary

Conventional RGB-based OCR systems on smart glasses suffer from motion blur, limited dynamic range, and excessive bandwidth/power consumption under low-light and high-speed motion conditions, severely degrading text recognition performance. Method: This paper proposes the first foveated OCR framework for smart glasses leveraging eye-tracking–guided event streams. It employs an event camera to capture sparse, asynchronous visual signals and dynamically foveates on the gaze-centered region using real-time eye-tracking data to suppress redundancy. A deep binary neural network enables efficient event-to-frame reconstruction, trained on synthetically generated data to enhance robustness in low-light and motion-blurred scenarios. Additionally, a multimodal large language model (MLLM) is integrated to improve semantic text understanding. Results: Compared to RGB-frame cameras, the system reduces bandwidth by 98% and power consumption by up to 2400×, significantly extending battery life and enabling real-time operation—thereby overcoming fundamental OCR limitations of traditional vision systems under extreme conditions.

Technology Category

Application Category

📝 Abstract

Current smart glasses equipped with RGB cameras struggle to perceive the environment in low-light and high-speed motion scenarios due to motion blur and the limited dynamic range of frame cameras. Additionally, capturing dense images with a frame camera requires large bandwidth and power consumption, consequently draining the battery faster. These challenges are especially relevant for developing algorithms that can read text from images. In this work, we propose a novel event-based Optical Character Recognition (OCR) approach for smart glasses. By using the eye gaze of the user, we foveate the event stream to significantly reduce bandwidth by around 98% while exploiting the benefits of event cameras in high-dynamic and fast scenes. Our proposed method performs deep binary reconstruction trained on synthetic data and leverages multimodal LLMs for OCR, outperforming traditional OCR solutions. Our results demonstrate the ability to read text in low light environments where RGB cameras struggle while using up to 2400 times less bandwidth than a wearable RGB camera.

Problem

Research questions and friction points this paper is trying to address.

Overcoming low-light and high-speed motion challenges for smart glasses

Reducing bandwidth and power consumption in text-reading algorithms

Enhancing OCR performance in high-dynamic and fast scenes

Innovation

Methods, ideas, or system contributions that make the work stand out.

Event-based OCR for smart glasses

Foveated event stream reduces bandwidth

Deep binary reconstruction with synthetic data

🔎 Similar Papers

No similar papers found.