LATTE: Improving Latex Recognition for Tables and Formulae with Iterative Refinement

📅 2024-09-21

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

164K/year

🤖 AI Summary

This work addresses the challenging problem of recognizing and editing complex LaTeX formulas and tables directly from PDF-rendered images. We propose the first end-to-end iterative refinement framework for joint formula and table reconstruction. Methodologically, we introduce a novel rendering-image delta-view feedback mechanism, jointly leveraging a fault localization model and a differentiable LaTeX refinement model within a unified architecture. Key technical contributions include: (1) delta-view–driven error localization; (2) end-to-end differentiable rendering supervision; and (3) structure-aware optimization for formulas and row/column-aligned optimization for tables. Experiments demonstrate substantial improvements: recognition accuracy increases by over 7.03% for both formulas and tables; iterative refinement success rates reach 46.08% for formulas and 25.51% for tables—significantly outperforming GPT-4V and existing state-of-the-art methods.

Technology Category

Application Category

📝 Abstract

Portable Document Format (PDF) files are dominantly used for storing and disseminating scientific research, legal documents, and tax information. LaTeX is a popular application for creating PDF documents. Despite its advantages, LaTeX is not WYSWYG -- what you see is what you get, i.e., the LaTeX source and rendered PDF images look drastically different, especially for formulae and tables. This gap makes it hard to modify or export LaTeX sources for formulae and tables from PDF images, and existing work is still limited. First, prior work generates LaTeX sources in a single iteration and struggles with complex LaTeX formulae. Second, existing work mainly recognizes and extracts LaTeX sources for formulae; and is incapable or ineffective for tables. This paper proposes LATTE, the first iterative refinement framework for LaTeX recognition. Specifically, we propose delta-view as feedback, which compares and pinpoints the differences between a pair of rendered images of the extracted LaTeX source and the expected correct image. Such delta-view feedback enables our fault localization model to localize the faulty parts of the incorrect recognition more accurately and enables our LaTeX refinement model to repair the incorrect extraction more accurately. LATTE improves the LaTeX source extraction accuracy of both LaTeX formulae and tables, outperforming existing techniques as well as GPT-4V by at least 7.03% of exact match, with a success refinement rate of 46.08% (formula) and 25.51% (table).

Problem

Research questions and friction points this paper is trying to address.

Improving LaTeX recognition accuracy

Handling complex LaTeX formulae

Enhancing table extraction effectiveness

Innovation

Methods, ideas, or system contributions that make the work stand out.

Iterative refinement framework

Delta-view feedback mechanism

Enhanced LaTeX source extraction

🔎 Similar Papers

No similar papers found.