Cascaded Robust Rectification for Arbitrary Document Images

📅 2025-11-28

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

To address severe geometric distortions in document images caused by arbitrary camera viewpoints and physical paper deformation in real-world scenarios, this paper proposes a progressive multi-stage rectification framework that sequentially corrects perspective, geometric, and content-level distortions. Methodologically, the complex non-rigid transformation is decoupled into three cascaded modules: global affine alignment, physics-informed deformation modeling, and content-aware iterative optimization—enabling coarse-to-fine, robust rectification. We introduce two novel evaluation metrics: Layout-Aligned OCR metrics (AED/ACER) and mask-augmented geometric measures (AD-M/AAD-M), which effectively disentangle layout reconstruction errors from geometric correction quality, thereby enhancing assessment stability. Extensive experiments on multiple challenging benchmarks demonstrate state-of-the-art performance: the AAD metric improves by 14.1%–34.7% over prior methods, significantly boosting document readability and downstream OCR accuracy.

Technology Category

Application Category

📝 Abstract

Document rectification in real-world scenarios poses significant challenges due to extreme variations in camera perspectives and physical distortions. Driven by the insight that complex transformations can be decomposed and resolved progressively, we introduce a novel multi-stage framework that progressively reverses distinct distortion types in a coarse-to-fine manner. Specifically, our framework first performs a global affine transformation to correct perspective distortions arising from the camera's viewpoint, then rectifies geometric deformations resulting from physical paper curling and folding, and finally employs a content-aware iterative process to eliminate fine-grained content distortions. To address limitations in existing evaluation protocols, we also propose two enhanced metrics: layout-aligned OCR metrics (AED/ACER) for a stable assessment that decouples geometric rectification quality from the layout analysis errors of OCR engines, and masked AD/AAD (AD-M/AAD-M) tailored for accurately evaluating geometric distortions in documents with incomplete boundaries. Extensive experiments show that our method establishes new state-of-the-art performance on multiple challenging benchmarks, yielding a substantial reduction of 14.1%--34.7% in the AAD metric and demonstrating superior efficacy in real-world applications. The code will be publicly available at https://github.com/chaoyunwang/ArbDR.

Problem

Research questions and friction points this paper is trying to address.

Correcting perspective and physical distortions in document images

Developing multi-stage framework for coarse-to-fine rectification

Proposing enhanced metrics for accurate geometric distortion evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-stage coarse-to-fine framework for document rectification

Global affine transformation correcting camera perspective distortions

Content-aware iterative process eliminating fine-grained content distortions

🔎 Similar Papers

No similar papers found.