🤖 AI Summary
Mathematical documents—particularly those rich in LaTeX-formatted equations—suffer from poor speech accessibility due to widespread shortcomings in existing TTS systems, including formula misreading, omission, and semantic degradation. To address this, we propose the first end-to-end OCR-T5-TTS framework, tightly integrating optical character recognition (OCR) for mathematical notation, fine-tuned T5-based semantic formula translation (enabling both structural reconstruction and natural-language description generation), and high-fidelity text-to-speech synthesis. This unified pipeline ensures faithful, context-aware audio rendering directly from document images. Evaluated on a benchmark dataset of academic papers containing complex mathematical expressions, our system achieves a word error rate (WER) of 0.281—reducing errors by 45% and 54% relative to Microsoft Edge and Adobe Acrobat, respectively—thereby significantly advancing accessibility for mathematically intensive content.
📝 Abstract
TTS (Text-to-Speech) document reader from Microsoft, Adobe, Apple, and OpenAI have been serviced worldwide. They provide relatively good TTS results for general plain text, but sometimes skip contents or provide unsatisfactory results for mathematical expressions. This is because most modern academic papers are written in LaTeX, and when LaTeX formulas are compiled, they are rendered as distinctive text forms within the document. However, traditional TTS document readers output only the text as it is recognized, without considering the mathematical meaning of the formulas. To address this issue, we propose MathReader, which effectively integrates OCR, a fine-tuned T5 model, and TTS. MathReader demonstrated a lower Word Error Rate (WER) than existing TTS document readers, such as Microsoft Edge and Adobe Acrobat, when processing documents containing mathematical formulas. MathReader reduced the WER from 0.510 to 0.281 compared to Microsoft Edge, and from 0.617 to 0.281 compared to Adobe Acrobat. This will significantly contribute to alleviating the inconvenience faced by users who want to listen to documents, especially those who are visually impaired. The code is available at https://github.com/hyeonsieun/MathReader.