Exploring Diffusion with Test-Time Training on Efficient Image Restoration

📅 2025-06-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Image inpainting faces critical challenges including inefficient feature fusion, high computational complexity, and redundant diffusion steps. To address these, we propose DiffRWKVIR—a novel framework that integrates test-time training (TTT) with an efficient diffusion mechanism for the first time. Its core contributions are: (1) Omni-Scale 2D state evolution, enabling global modeling with linear complexity; (2) Chunk-Optimized Flash processing, enhancing parallelism and throughput; and (3) Prior-Guided diffusion, achieving high-fidelity denoising in only 5–20 steps. The framework synergistically combines an enhanced RWKV architecture, multi-directional 2D scanning, chunk-wise continuous parallel computation, and image prior representation (IPR)-driven TTT for dynamic adaptation. Extensive experiments demonstrate state-of-the-art performance on super-resolution and inpainting tasks—outperforming SwinIR, HAT, and MambaIR/v2 in PSNR, SSIM, and LPIPS—while accelerating both training and inference by 45% and significantly improving hardware utilization.

Technology Category

Application Category

📝 Abstract
Image restoration faces challenges including ineffective feature fusion, computational bottlenecks and inefficient diffusion processes. To address these, we propose DiffRWKVIR, a novel framework unifying Test-Time Training (TTT) with efficient diffusion. Our approach introduces three key innovations: (1) Omni-Scale 2D State Evolution extends RWKV's location-dependent parameterization to hierarchical multi-directional 2D scanning, enabling global contextual awareness with linear complexity O(L); (2) Chunk-Optimized Flash Processing accelerates intra-chunk parallelism by 3.2x via contiguous chunk processing (O(LCd) complexity), reducing sequential dependencies and computational overhead; (3) Prior-Guided Efficient Diffusion extracts a compact Image Prior Representation (IPR) in only 5-20 steps, proving 45% faster training/inference than DiffIR while solving computational inefficiency in denoising. Evaluated across super-resolution and inpainting benchmarks (Set5, Set14, BSD100, Urban100, Places365), DiffRWKVIR outperforms SwinIR, HAT, and MambaIR/v2 in PSNR, SSIM, LPIPS, and efficiency metrics. Our method establishes a new paradigm for adaptive, high-efficiency image restoration with optimized hardware utilization.
Problem

Research questions and friction points this paper is trying to address.

Ineffective feature fusion in image restoration
Computational bottlenecks in diffusion processes
Inefficient denoising and training in image restoration
Innovation

Methods, ideas, or system contributions that make the work stand out.

Omni-Scale 2D State Evolution for global awareness
Chunk-Optimized Flash Processing for faster parallelism
Prior-Guided Efficient Diffusion for quick training
🔎 Similar Papers
No similar papers found.
R
Rongchang Lu
School of Ecological and Environmental Engineering, Qinghai University, Xining, 810016, China
T
Tianduo Luo
Department of Computer Technology and Applications, Qinghai University, 810016, Xining China
Yunzhi Zhang
Yunzhi Zhang
Stanford University
Computer VisionReinforcement Learning
C
Conghan Yue
School of Computer Science, Sun Yat-sen University, Guangzhou 510006, China
P
Pei Yang
Department of Computer Technology and Applications, Qinghai University, 810016, Xining China
G
Guibao Liu
Department of Computer Technology and Applications, Qinghai University, 810016, Xining China
C
Changyang Gu
Department of Computer Technology and Applications, Qinghai University, 810016, Xining China