🤖 AI Summary
Existing document restoration methods rely either on monolithic single-task models—leading to system bloat and poor scalability—or on unified models constrained by handcrafted prompts and fixed priors, limiting cross-task synergy. This paper proposes a diffusion-based multi-task joint restoration framework. Its core contributions are: (1) a learnable task prompt mechanism enabling adaptive task control; (2) a Prior Pool that dynamically stores multi-scale restoration priors; and (3) a Prior Fusion Module (PFM) that adaptively integrates local high-frequency and global low-frequency priors. The framework supports end-to-end restoration of diverse degradations—including text erasure, ink corruption, and crease removal—achieving state-of-the-art or comparable performance to specialized models across multiple benchmarks. Crucially, it enables zero-shot generalization to unseen tasks, significantly enhancing model generalizability and practical deployment efficiency.
📝 Abstract
Removing various degradations from damaged documents greatly benefits digitization, downstream document analysis, and readability. Previous methods often treat each restoration task independently with dedicated models, leading to a cumbersome and highly complex document processing system. Although recent studies attempt to unify multiple tasks, they often suffer from limited scalability due to handcrafted prompts and heavy preprocessing, and fail to fully exploit inter-task synergy within a shared architecture. To address the aforementioned challenges, we propose Uni-DocDiff, a Unified and highly scalable Document restoration model based on Diffusion. Uni-DocDiff develops a learnable task prompt design, ensuring exceptional scalability across diverse tasks. To further enhance its multi-task capabilities and address potential task interference, we devise a novel extbf{Prior extbf{P}ool}, a simple yet comprehensive mechanism that combines both local high-frequency features and global low-frequency features. Additionally, we design the extbf{Prior extbf{F}usion extbf{M}odule (PFM)}, which enables the model to adaptively select the most relevant prior information for each specific task. Extensive experiments show that the versatile Uni-DocDiff achieves performance comparable or even superior performance compared with task-specific expert models, and simultaneously holds the task scalability for seamless adaptation to new tasks.