🤖 AI Summary
Generative facial video compression (GFVC) lacks standardized solutions, hindering interoperability and deployment.
Method: This paper proposes the first standardized GFVC framework based on Supplemental Enhancement Information (SEI), encoding 2D/3D facial landmarks, semantic facial attributes, and compact deep features as SEI messages embedded directly into H.266/VVC bitstreams.
Contribution/Results: The framework has been adopted into the ITU-T H.274 / ISO/IEC 23002-7 international standard—the first GFVC standardization effort—introducing novel SEI syntaxes that enable synergistic integration of model-based traditional coding and generative AI. Experimental results demonstrate significant rate-distortion gains over the VVC anchor. Moreover, the framework supports emerging applications including user-customized facial animation, real-time aesthetic filtering, and metaverse-enabled interactive experiences, establishing a foundational infrastructure for standardized, functionally extensible generative video compression.
📝 Abstract
This paper proposes a Generative Face Video Compression (GFVC) approach using Supplemental Enhancement Information (SEI), where a series of compact spatial and temporal representations of a face video signal (i.e., 2D/3D key-points, facial semantics and compact features) can be coded using SEI message and inserted into the coded video bitstream. At the time of writing, the proposed GFVC approach using SEI messages has been adopted into the official working draft of Versatile Supplemental Enhancement Information (VSEI) standard by the Joint Video Experts Team (JVET) of ISO/IEC JTC 1/SC 29 and ITU-T SG16, which will be standardized as a new version for"ITU-T H.274 | ISO/IEC 23002-7". To the best of the authors' knowledge, the JVET work on the proposed SEI-based GFVC approach is the first standardization activity for generative video compression. The proposed SEI approach has not only advanced the reconstruction quality of early-day Model-Based Coding (MBC) via the state-of-the-art generative technique, but also established a new SEI definition for future GFVC applications and deployment. Experimental results illustrate that the proposed SEI-based GFVC approach can achieve remarkable rate-distortion performance compared with the latest Versatile Video Coding (VVC) standard, whilst also potentially enabling a wide variety of functionalities including user-specified animation/filtering and metaverse-related applications.