TADoc: Robust Time-Aware Document Image Dewarping

📅 2025-08-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Document images captured by portable devices often exhibit severe geometric distortions—including bending, folding, and rotation—making effective flattening challenging. Moreover, conventional OCR evaluation metrics lack sensitivity in text-sparse scenarios. To address these issues, this paper proposes a time-aware dynamic dewarping paradigm that models document flattening as a progressive process with multiple intermediate states. We introduce TADoc, a lightweight network integrating sequential state modeling and efficient parameterization. We further propose Document Layout Similarity (DLS), the first layout-aware metric explicitly quantifying geometric structure recovery fidelity—thereby overcoming the limitations of traditional OCR metrics in sparse-text settings. Extensive experiments on diverse benchmarks with varying distortion types and severities demonstrate state-of-the-art performance. Our method significantly enhances robustness and accuracy for downstream OCR and document understanding tasks.

Technology Category

Application Category

📝 Abstract
Flattening curved, wrinkled, and rotated document images captured by portable photographing devices, termed document image dewarping, has become an increasingly important task with the rise of digital economy and online working. Although many methods have been proposed recently, they often struggle to achieve satisfactory results when confronted with intricate document structures and higher degrees of deformation in real-world scenarios. Our main insight is that, unlike other document restoration tasks (e.g., deblurring), dewarping in real physical scenes is a progressive motion rather than a one-step transformation. Based on this, we have undertaken two key initiatives. Firstly, we reformulate this task, modeling it for the first time as a dynamic process that encompasses a series of intermediate states. Secondly, we design a lightweight framework called TADoc (Time-Aware Document Dewarping Network) to address the geometric distortion of document images. In addition, due to the inadequacy of OCR metrics for document images containing sparse text, the comprehensiveness of evaluation is insufficient. To address this shortcoming, we propose a new metric -- DLS (Document Layout Similarity) -- to evaluate the effectiveness of document dewarping in downstream tasks. Extensive experiments and in-depth evaluations have been conducted and the results indicate that our model possesses strong robustness, achieving superiority on several benchmarks with different document types and degrees of distortion.
Problem

Research questions and friction points this paper is trying to address.

Flattening curved document images from photos
Handling intricate document structures and deformations
Evaluating dewarping effectiveness with new metric DLS
Innovation

Methods, ideas, or system contributions that make the work stand out.

Models dewarping as dynamic process with states
Introduces TADoc lightweight dewarping network
Proposes DLS metric for layout evaluation
F
Fangmin Zhao
Institute of Information Engineering, Chinese Academy of Sciences; School of Cyber Security, University of Chinese Academy of Sciences
Weichao Zeng
Weichao Zeng
Institute of Information Engineering, Chinese Academy of Sciences
Computer Vision
Zhenhang Li
Zhenhang Li
Institute of Information Engineering, CAS, China
computer visionimage generation
Dongbao Yang
Dongbao Yang
Institute of Information Engineering, Chinese Academy of Sciences
Computer Vision
Y
Yu Zhou
VCIP & TMCC & DISSec, College of Computer Science & College of Cryptology and Cyber Science, Nankai University