Document Image Rectification Bases on Self-Adaptive Multitask Fusion

📅 2025-05-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the insufficient modeling of inter-task feature complementarity, weak geometric distortion awareness, and unmitigated negative interference in curved document image rectification, this paper proposes SalmRec, an adaptive learnable multi-task fusion network. Methodologically, we introduce a novel adaptive cross-task feature aggregation module equipped with a global/local dual-level gating mechanism to explicitly model both positive inter-task complementarity and negative interference suppression. SalmRec jointly optimizes three complementary tasks: 3D coordinate regression, text-line segmentation, and background removal. Extensive experiments demonstrate that SalmRec achieves significant improvements in rectification accuracy on three major benchmarks—DIR300, DocUNet (English), and DocReal (Chinese). Ablation studies confirm that multi-task collaboration yields substantial gains in dewarping performance, validating the effectiveness of our adaptive fusion design.

Technology Category

Application Category

📝 Abstract
Deformed document image rectification is essential for real-world document understanding tasks, such as layout analysis and text recognition. However, current multi-task methods -- such as background removal, 3D coordinate prediction, and text line segmentation -- often overlook the complementary features between tasks and their interactions. To address this gap, we propose a self-adaptive learnable multi-task fusion rectification network named SalmRec. This network incorporates an inter-task feature aggregation module that adaptively improves the perception of geometric distortions, enhances feature complementarity, and reduces negative interference. We also introduce a gating mechanism to balance features both within global tasks and between local tasks effectively. Experimental results on two English benchmarks (DIR300 and DocUNet) and one Chinese benchmark (DocReal) demonstrate that our method significantly improves rectification performance. Ablation studies further highlight the positive impact of different tasks on dewarping and the effectiveness of our proposed module.
Problem

Research questions and friction points this paper is trying to address.

Rectifying deformed document images for better understanding
Addressing overlooked task interactions in multi-task methods
Improving geometric distortion perception and feature complementarity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-adaptive learnable multi-task fusion network
Inter-task feature aggregation module for distortion perception
Gating mechanism balances global and local features
🔎 Similar Papers
No similar papers found.