GRACE: Designing Generative Face Video Codec via Agile Hardware-Centric Workflow

📅 2025-11-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Deploying autoregressive generative codecs (AGCs) on resource- and power-constrained edge devices is challenging due to their large parameter count, poor dynamic adaptability, and high computational and transmission overhead. Method: This paper proposes a hardware–software co-design framework tailored for FPGAs, enabling the first efficient implementation of a generative facial video decoder on such platforms. It introduces an overlapped acceleration architecture with dual-buffer pipelining, complemented by post-training static quantization, layer fusion, loop unrolling, and hardware–software co-optimization. Custom hardware engines are designed for convolution, grid sampling, and upsampling. Results: Evaluated on the PYNQ-Z1 platform, the implementation achieves only 11.7 μJ per-pixel reconstruction energy consumption, delivering 24.9× and 4.1× higher energy efficiency than CPU- and GPU-based counterparts, respectively—significantly advancing the practical deployment of low-power generative video decoding at the edge.

Technology Category

Application Category

📝 Abstract
The Animation-based Generative Codec (AGC) is an emerging paradigm for talking-face video compression. However, deploying its intricate decoder on resource and power-constrained edge devices presents challenges due to numerous parameters, the inflexibility to adapt to dynamically evolving algorithms, and the high power consumption induced by extensive computations and data transmission. This paper for the first time proposes a novel field programmable gate arrays (FPGAs)-oriented AGC deployment scheme for edge-computing video services. Initially, we analyze the AGC algorithm and employ network compression methods including post-training static quantization and layer fusion techniques. Subsequently, we design an overlapped accelerator utilizing the co-processor paradigm to perform computations through software-hardware co-design. The hardware processing unit comprises engines such as convolution, grid sampling, upsample, etc. Parallelization optimization strategies like double-buffered pipelines and loop unrolling are employed to fully exploit the resources of FPGA. Ultimately, we establish an AGC FPGA prototype on the PYNQ-Z1 platform using the proposed scheme, achieving extbf{24.9$ imes$} and extbf{4.1$ imes$} higher energy efficiency against commercial Central Processing Unit (CPU) and Graphic Processing Unit (GPU), respectively. Specifically, only extbf{11.7} microjoules ($upmu$J) are required for one pixel reconstructed by this FPGA system.
Problem

Research questions and friction points this paper is trying to address.

Deploying generative face video codecs on resource-constrained edge devices
Overcoming high computational complexity and power consumption challenges
Enabling efficient video compression with agile hardware-software co-design
Innovation

Methods, ideas, or system contributions that make the work stand out.

FPGA-oriented deployment for generative face codec
Network compression via quantization and layer fusion
Hardware accelerator with parallelization optimization strategies
🔎 Similar Papers
R
Rui Wan
State Key Laboratory of Integrated Chips and Systems, Fudan University, Shanghai 200000, China
Q
Qi Zheng
State Key Laboratory of Integrated Chips and Systems, Fudan University, Shanghai 200000, China
R
Ruoyu Zhang
State Key Laboratory of Integrated Chips and Systems, Fudan University, Shanghai 200000, China
Bu Chen
Bu Chen
State Key Laboratory of Integrated Chips and Systems, Fudan University, Shanghai 200000, China
J
Jiaming Liu
Department of Electronic Engineering, Shanghai Jiao Tong University, Shanghai, China
M
Min Li
State Key Laboratory of Integrated Chips and Systems, Fudan University, Shanghai 200000, China
M
Ming’e Jing
State Key Laboratory of Integrated Chips and Systems, Fudan University, Shanghai 200000, China
J
Jinjia Zhou
Hosei University, Koganei, Tokyo, Japan
Yibo Fan
Yibo Fan
Professor, Fudan University
Video CodingImage ProcessingProcessorVLSI Design