Words as Geometric Features: Estimating Homography using Optical Character Recognition as Compressed Image Representation

📅 2025-05-25

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

To address the failure of conventional image-based document alignment methods when original document images are unavailable—due to privacy constraints, storage limitations, or bandwidth restrictions—this paper proposes an image-free homography estimation method relying solely on OCR output (i.e., word tokens and their bounding box coordinates). The core innovation lies in directly modeling OCR results as compact, semantically enhanced geometric features, integrating word-level spatial relationship modeling, RANSAC-based robust matching, and homography optimization guided by text coordinates. Evaluated on multiple real-world document datasets, our approach achieves higher alignment accuracy than state-of-the-art image-feature-based methods (e.g., SIFT+RANSAC), while significantly reducing computational overhead and storage requirements. This establishes a new, efficient, and practical paradigm for document alignment in privacy-sensitive and resource-constrained scenarios.

Technology Category

Application Category

📝 Abstract

Document alignment and registration play a crucial role in numerous real-world applications, such as automated form processing, anomaly detection, and workflow automation. Traditional methods for document alignment rely on image-based features like keypoints, edges, and textures to estimate geometric transformations, such as homographies. However, these approaches often require access to the original document images, which may not always be available due to privacy, storage, or transmission constraints. This paper introduces a novel approach that leverages Optical Character Recognition (OCR) outputs as features for homography estimation. By utilizing the spatial positions and textual content of OCR-detected words, our method enables document alignment without relying on pixel-level image data. This technique is particularly valuable in scenarios where only OCR outputs are accessible. Furthermore, the method is robust to OCR noise, incorporating RANSAC to handle outliers and inaccuracies in the OCR data. On a set of test documents, we demonstrate that our OCR-based approach even performs more accurately than traditional image-based methods, offering a more efficient and scalable solution for document registration tasks. The proposed method facilitates applications in document processing, all while reducing reliance on high-dimensional image data.

Problem

Research questions and friction points this paper is trying to address.

Estimating homography using OCR outputs instead of image data

Aligning documents without access to original images

Handling OCR noise robustly for accurate document registration

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses OCR outputs for homography estimation

Leverages word positions and textual content

Robust to OCR noise with RANSAC

🔎 Similar Papers

A Cross-Font Image Retrieval Network for Recognizing Undeciphered Oracle Bone Inscriptions