🤖 AI Summary
To address the high power consumption and deployment challenges of deep learning models for real-time denoising of high-resolution video on edge/datacenter platforms, this paper proposes an FPGA-accelerated denoising system. Methodologically, it introduces the first deployment of a quantization-aware fine-tuned compact convolutional network—quantized to INT8—on an AMD Data Processing Unit (DPU) FPGA architecture, synergistically integrating hardware-customized acceleration with a client-server framework and seamless interoperability with industry-standard post-production tools (e.g., NUKE). The key contributions are: (1) unified support for both encoding-loop and post-production workflows; (2) lossless preservation of PSNR and SSIM metrics; (3) achieving 37.71 GOPS throughput; (4) improving energy efficiency by 5.29× over state-of-the-art FPGA-based approaches; and (5) significantly reducing per-frame energy consumption—thereby jointly delivering high visual fidelity and high energy efficiency.
📝 Abstract
Denoising is a core operation in modern video pipelines. In codecs, in-loop filters suppress sensor noise and quantisation artefacts to improve rate-distortion performance; in cinema post-production, denoisers are used for restoration, grain management, and plate clean-up. However, state-of-the-art deep denoisers are computationally intensive and, at scale, are typically deployed on GPUs, incurring high power and cost for real-time, high-resolution streams. This paper presents Real-Time Denoise (ReTiDe), a hardware-accelerated denoising system that serves inference on data-centre Field Programmable Gate Arrays (FPGAs). A compact convolutional model is quantised (post-training quantisation plus quantisation-aware fine-tuning) to INT8 and compiled for AMD Deep Learning Processor Unit (DPU)-based FPGAs. A client-server integration offloads computation from the host CPU/GPU to a networked FPGA service, while remaining callable from existing workflows, e.g., NUKE, without disrupting artist tooling. On representative benchmarks, ReTiDe delivers 37.71$ imes$ Giga Operations Per Second (GOPS) throughput and 5.29$ imes$ higher energy efficiency than prior FPGA denoising accelerators, with negligible degradation in Peak Signal-to-Noise Ratio (PSNR)/Structural Similarity Index (SSIM). These results indicate that specialised accelerators can provide practical, scalable denoising for both encoding pipelines and post-production, reducing energy per frame without sacrificing quality or workflow compatibility. Code is available at https://github.com/RCSL-TCD/ReTiDe.