🤖 AI Summary
Document images captured by portable devices often exhibit severe geometric distortions—including bending, folding, and rotation—making effective flattening challenging. Moreover, conventional OCR evaluation metrics lack sensitivity in text-sparse scenarios. To address these issues, this paper proposes a time-aware dynamic dewarping paradigm that models document flattening as a progressive process with multiple intermediate states. We introduce TADoc, a lightweight network integrating sequential state modeling and efficient parameterization. We further propose Document Layout Similarity (DLS), the first layout-aware metric explicitly quantifying geometric structure recovery fidelity—thereby overcoming the limitations of traditional OCR metrics in sparse-text settings. Extensive experiments on diverse benchmarks with varying distortion types and severities demonstrate state-of-the-art performance. Our method significantly enhances robustness and accuracy for downstream OCR and document understanding tasks.
📝 Abstract
Flattening curved, wrinkled, and rotated document images captured by portable photographing devices, termed document image dewarping, has become an increasingly important task with the rise of digital economy and online working. Although many methods have been proposed recently, they often struggle to achieve satisfactory results when confronted with intricate document structures and higher degrees of deformation in real-world scenarios. Our main insight is that, unlike other document restoration tasks (e.g., deblurring), dewarping in real physical scenes is a progressive motion rather than a one-step transformation. Based on this, we have undertaken two key initiatives. Firstly, we reformulate this task, modeling it for the first time as a dynamic process that encompasses a series of intermediate states. Secondly, we design a lightweight framework called TADoc (Time-Aware Document Dewarping Network) to address the geometric distortion of document images. In addition, due to the inadequacy of OCR metrics for document images containing sparse text, the comprehensiveness of evaluation is insufficient. To address this shortcoming, we propose a new metric -- DLS (Document Layout Similarity) -- to evaluate the effectiveness of document dewarping in downstream tasks. Extensive experiments and in-depth evaluations have been conducted and the results indicate that our model possesses strong robustness, achieving superiority on several benchmarks with different document types and degrees of distortion.