Compressing Human Body Video with Interactive Semantics: A Generative Approach

📅 2025-05-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of balancing interactivity and reconstruction quality in highly compressed human video. We propose the first end-to-end differentiable human video compression framework enabling semantic-level interactivity. Methodologically, we leverage a 3D human model to decouple motion into editable semantic embeddings; integrate grid-based motion field evolution with a generative decoder; and achieve real-time, bitstream-level semantic editing and controllable reconstruction—without pre- or post-processing. Our contributions are threefold: (1) the first human video codec supporting semantic-level interactivity; (2) superior rate-distortion performance over VVC and state-of-the-art generative compression methods at ultra-low bitrates; and (3) simultaneous high-fidelity reconstruction and millisecond-latency semantic manipulation. The framework establishes a new paradigm for real-time digital human communication in the metaverse.

Technology Category

Application Category

📝 Abstract
In this paper, we propose to compress human body video with interactive semantics, which can facilitate video coding to be interactive and controllable by manipulating semantic-level representations embedded in the coded bitstream. In particular, the proposed encoder employs a 3D human model to disentangle nonlinear dynamics and complex motion of human body signal into a series of configurable embeddings, which are controllably edited, compactly compressed, and efficiently transmitted. Moreover, the proposed decoder can evolve the mesh-based motion fields from these decoded semantics to realize the high-quality human body video reconstruction. Experimental results illustrate that the proposed framework can achieve promising compression performance for human body videos at ultra-low bitrate ranges compared with the state-of-the-art video coding standard Versatile Video Coding (VVC) and the latest generative compression schemes. Furthermore, the proposed framework enables interactive human body video coding without any additional pre-/post-manipulation processes, which is expected to shed light on metaverse-related digital human communication in the future.
Problem

Research questions and friction points this paper is trying to address.

Compress human body video using interactive semantic representations
Enable controllable editing and efficient transmission of motion embeddings
Achieve high-quality reconstruction at ultra-low bitrates
Innovation

Methods, ideas, or system contributions that make the work stand out.

3D human model disentangles dynamics into configurable embeddings
Mesh-based motion fields evolve from decoded semantics
Interactive video coding without additional manipulation processes
🔎 Similar Papers
No similar papers found.
B
Bolin Chen
City University of Hong Kong
S
Shanzhi Yin
City University of Hong Kong
H
Hanwei Zhu
City University of Hong Kong
L
Lingyu Zhu
City University of Hong Kong
Z
Zihan Zhang
City University of Hong Kong
J
Jie Chen
Alibaba DAMO Academy & Hupan Laboratory
R
Ru-Ling Liao
Alibaba DAMO Academy & Hupan Laboratory
S
Shiqi Wang
City University of Hong Kong, Shenzhen Research Institute
Yan Ye
Yan Ye
Alibaba Inc
video coding