ReTiDe: Real-Time Denoising for Energy-Efficient Motion Picture Processing with FPGAs

📅 2025-10-04

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

To address the high power consumption and deployment challenges of deep learning models for real-time denoising of high-resolution video on edge/datacenter platforms, this paper proposes an FPGA-accelerated denoising system. Methodologically, it introduces the first deployment of a quantization-aware fine-tuned compact convolutional network—quantized to INT8—on an AMD Data Processing Unit (DPU) FPGA architecture, synergistically integrating hardware-customized acceleration with a client-server framework and seamless interoperability with industry-standard post-production tools (e.g., NUKE). The key contributions are: (1) unified support for both encoding-loop and post-production workflows; (2) lossless preservation of PSNR and SSIM metrics; (3) achieving 37.71 GOPS throughput; (4) improving energy efficiency by 5.29× over state-of-the-art FPGA-based approaches; and (5) significantly reducing per-frame energy consumption—thereby jointly delivering high visual fidelity and high energy efficiency.

Technology Category

Application Category

📝 Abstract

Denoising is a core operation in modern video pipelines. In codecs, in-loop filters suppress sensor noise and quantisation artefacts to improve rate-distortion performance; in cinema post-production, denoisers are used for restoration, grain management, and plate clean-up. However, state-of-the-art deep denoisers are computationally intensive and, at scale, are typically deployed on GPUs, incurring high power and cost for real-time, high-resolution streams. This paper presents Real-Time Denoise (ReTiDe), a hardware-accelerated denoising system that serves inference on data-centre Field Programmable Gate Arrays (FPGAs). A compact convolutional model is quantised (post-training quantisation plus quantisation-aware fine-tuning) to INT8 and compiled for AMD Deep Learning Processor Unit (DPU)-based FPGAs. A client-server integration offloads computation from the host CPU/GPU to a networked FPGA service, while remaining callable from existing workflows, e.g., NUKE, without disrupting artist tooling. On representative benchmarks, ReTiDe delivers 37.71$ imes$ Giga Operations Per Second (GOPS) throughput and 5.29$ imes$ higher energy efficiency than prior FPGA denoising accelerators, with negligible degradation in Peak Signal-to-Noise Ratio (PSNR)/Structural Similarity Index (SSIM). These results indicate that specialised accelerators can provide practical, scalable denoising for both encoding pipelines and post-production, reducing energy per frame without sacrificing quality or workflow compatibility. Code is available at https://github.com/RCSL-TCD/ReTiDe.

Problem

Research questions and friction points this paper is trying to address.

Real-time video denoising requires high computational power

Deep learning denoisers consume excessive energy on GPUs

FPGA accelerators need efficient deployment without quality loss

Innovation

Methods, ideas, or system contributions that make the work stand out.

FPGA-accelerated denoising system for real-time video

INT8 quantized compact convolutional model on DPU

Client-server integration maintains workflow compatibility

🔎 Similar Papers

An Efficient Real-Time Object Detection Framework on Resource-Constricted Hardware Devices via Software and Hardware Co-design