Rethinking Generative Human Video Coding with Implicit Motion Transformation

📅 2025-06-12

📈 Citations: 0

✨ Influential: 0

career value

228K/year

🤖 AI Summary

To address reconstruction distortion and inaccurate motion modeling caused by explicit motion estimation in human video compression, this paper proposes the Implicit Motion Transformation (IMT) paradigm. IMT abandons explicit optical flow estimation, instead mapping video frames into compact visual features and learning, in an end-to-end manner, a nonlinear transformation from these features to implicit motion guidance signals. The method integrates implicit neural representations, motion modeling in feature space, and a learnable implicit transformation module, trained under rate-distortion optimization. As the first work to introduce implicit motion modeling into generative human video coding, IMT achieves state-of-the-art performance on standard benchmarks: it reduces BD-rate by 18.7% and improves PSNR by 2.3 dB over prior methods, while significantly enhancing motion consistency and fidelity of texture details.

Technology Category

Application Category

📝 Abstract

Beyond traditional hybrid-based video codec, generative video codec could achieve promising compression performance by evolving high-dimensional signals into compact feature representations for bitstream compactness at the encoder side and developing explicit motion fields as intermediate supervision for high-quality reconstruction at the decoder side. This paradigm has achieved significant success in face video compression. However, compared to facial videos, human body videos pose greater challenges due to their more complex and diverse motion patterns, i.e., when using explicit motion guidance for Generative Human Video Coding (GHVC), the reconstruction results could suffer severe distortions and inaccurate motion. As such, this paper highlights the limitations of explicit motion-based approaches for human body video compression and investigates the GHVC performance improvement with the aid of Implicit Motion Transformation, namely IMT. In particular, we propose to characterize complex human body signal into compact visual features and transform these features into implicit motion guidance for signal reconstruction. Experimental results demonstrate the effectiveness of the proposed IMT paradigm, which can facilitate GHVC to achieve high-efficiency compression and high-fidelity synthesis.

Problem

Research questions and friction points this paper is trying to address.

Improving human video compression using implicit motion transformation

Addressing complex motion patterns in generative human video coding

Enhancing reconstruction quality and compression efficiency in GHVC

Innovation

Methods, ideas, or system contributions that make the work stand out.

Implicit Motion Transformation for human video coding

Compact visual features for signal reconstruction

High-efficiency compression with high-fidelity synthesis

🔎 Similar Papers

Generalizable Implicit Motion Modeling for Video Frame Interpolation