🤖 AI Summary
Image inpainting faces critical challenges including inefficient feature fusion, high computational complexity, and redundant diffusion steps. To address these, we propose DiffRWKVIR—a novel framework that integrates test-time training (TTT) with an efficient diffusion mechanism for the first time. Its core contributions are: (1) Omni-Scale 2D state evolution, enabling global modeling with linear complexity; (2) Chunk-Optimized Flash processing, enhancing parallelism and throughput; and (3) Prior-Guided diffusion, achieving high-fidelity denoising in only 5–20 steps. The framework synergistically combines an enhanced RWKV architecture, multi-directional 2D scanning, chunk-wise continuous parallel computation, and image prior representation (IPR)-driven TTT for dynamic adaptation. Extensive experiments demonstrate state-of-the-art performance on super-resolution and inpainting tasks—outperforming SwinIR, HAT, and MambaIR/v2 in PSNR, SSIM, and LPIPS—while accelerating both training and inference by 45% and significantly improving hardware utilization.
📝 Abstract
Image restoration faces challenges including ineffective feature fusion, computational bottlenecks and inefficient diffusion processes. To address these, we propose DiffRWKVIR, a novel framework unifying Test-Time Training (TTT) with efficient diffusion. Our approach introduces three key innovations: (1) Omni-Scale 2D State Evolution extends RWKV's location-dependent parameterization to hierarchical multi-directional 2D scanning, enabling global contextual awareness with linear complexity O(L); (2) Chunk-Optimized Flash Processing accelerates intra-chunk parallelism by 3.2x via contiguous chunk processing (O(LCd) complexity), reducing sequential dependencies and computational overhead; (3) Prior-Guided Efficient Diffusion extracts a compact Image Prior Representation (IPR) in only 5-20 steps, proving 45% faster training/inference than DiffIR while solving computational inefficiency in denoising. Evaluated across super-resolution and inpainting benchmarks (Set5, Set14, BSD100, Urban100, Places365), DiffRWKVIR outperforms SwinIR, HAT, and MambaIR/v2 in PSNR, SSIM, LPIPS, and efficiency metrics. Our method establishes a new paradigm for adaptive, high-efficiency image restoration with optimized hardware utilization.