Cascaded Robust Rectification for Arbitrary Document Images

📅 2025-11-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address severe geometric distortions in document images caused by arbitrary camera viewpoints and physical paper deformation in real-world scenarios, this paper proposes a progressive multi-stage rectification framework that sequentially corrects perspective, geometric, and content-level distortions. Methodologically, the complex non-rigid transformation is decoupled into three cascaded modules: global affine alignment, physics-informed deformation modeling, and content-aware iterative optimization—enabling coarse-to-fine, robust rectification. We introduce two novel evaluation metrics: Layout-Aligned OCR metrics (AED/ACER) and mask-augmented geometric measures (AD-M/AAD-M), which effectively disentangle layout reconstruction errors from geometric correction quality, thereby enhancing assessment stability. Extensive experiments on multiple challenging benchmarks demonstrate state-of-the-art performance: the AAD metric improves by 14.1%–34.7% over prior methods, significantly boosting document readability and downstream OCR accuracy.

Technology Category

Application Category

📝 Abstract
Document rectification in real-world scenarios poses significant challenges due to extreme variations in camera perspectives and physical distortions. Driven by the insight that complex transformations can be decomposed and resolved progressively, we introduce a novel multi-stage framework that progressively reverses distinct distortion types in a coarse-to-fine manner. Specifically, our framework first performs a global affine transformation to correct perspective distortions arising from the camera's viewpoint, then rectifies geometric deformations resulting from physical paper curling and folding, and finally employs a content-aware iterative process to eliminate fine-grained content distortions. To address limitations in existing evaluation protocols, we also propose two enhanced metrics: layout-aligned OCR metrics (AED/ACER) for a stable assessment that decouples geometric rectification quality from the layout analysis errors of OCR engines, and masked AD/AAD (AD-M/AAD-M) tailored for accurately evaluating geometric distortions in documents with incomplete boundaries. Extensive experiments show that our method establishes new state-of-the-art performance on multiple challenging benchmarks, yielding a substantial reduction of 14.1%--34.7% in the AAD metric and demonstrating superior efficacy in real-world applications. The code will be publicly available at https://github.com/chaoyunwang/ArbDR.
Problem

Research questions and friction points this paper is trying to address.

Correcting perspective and physical distortions in document images
Developing multi-stage framework for coarse-to-fine rectification
Proposing enhanced metrics for accurate geometric distortion evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-stage coarse-to-fine framework for document rectification
Global affine transformation correcting camera perspective distortions
Content-aware iterative process eliminating fine-grained content distortions
🔎 Similar Papers
No similar papers found.
Chaoyun Wang
Chaoyun Wang
PhD student at Xi 'an Jiaotong University
Geometric deep learning
Q
Quanxin Huang
National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, National Engineering Research Center for Visual Information and Applications, and Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University, Xi’an 710049, China
I-Chao Shen
I-Chao Shen
The University of Tokyo
Computer GraphicsMachine Learning
Takeo Igarashi
Takeo Igarashi
The University of Tokyo
Computer Graphics
Nanning Zheng
Nanning Zheng
Xi'an Jiaotong University
Caigui Jiang
Caigui Jiang
Xi'an Jiaotong University
Computer graphicsarchitectural geometry