Compressing Human Body Video with Interactive Semantics: A Generative Approach

📅 2025-05-22

📈 Citations: 0

✨ Influential: 0

career value

231K/year

🤖 AI Summary

This work addresses the challenge of balancing interactivity and reconstruction quality in highly compressed human video. We propose the first end-to-end differentiable human video compression framework enabling semantic-level interactivity. Methodologically, we leverage a 3D human model to decouple motion into editable semantic embeddings; integrate grid-based motion field evolution with a generative decoder; and achieve real-time, bitstream-level semantic editing and controllable reconstruction—without pre- or post-processing. Our contributions are threefold: (1) the first human video codec supporting semantic-level interactivity; (2) superior rate-distortion performance over VVC and state-of-the-art generative compression methods at ultra-low bitrates; and (3) simultaneous high-fidelity reconstruction and millisecond-latency semantic manipulation. The framework establishes a new paradigm for real-time digital human communication in the metaverse.

Technology Category

Application Category

📝 Abstract

In this paper, we propose to compress human body video with interactive semantics, which can facilitate video coding to be interactive and controllable by manipulating semantic-level representations embedded in the coded bitstream. In particular, the proposed encoder employs a 3D human model to disentangle nonlinear dynamics and complex motion of human body signal into a series of configurable embeddings, which are controllably edited, compactly compressed, and efficiently transmitted. Moreover, the proposed decoder can evolve the mesh-based motion fields from these decoded semantics to realize the high-quality human body video reconstruction. Experimental results illustrate that the proposed framework can achieve promising compression performance for human body videos at ultra-low bitrate ranges compared with the state-of-the-art video coding standard Versatile Video Coding (VVC) and the latest generative compression schemes. Furthermore, the proposed framework enables interactive human body video coding without any additional pre-/post-manipulation processes, which is expected to shed light on metaverse-related digital human communication in the future.

Problem

Research questions and friction points this paper is trying to address.

Compress human body video using interactive semantic representations

Enable controllable editing and efficient transmission of motion embeddings

Achieve high-quality reconstruction at ultra-low bitrates

Innovation

Methods, ideas, or system contributions that make the work stand out.

3D human model disentangles dynamics into configurable embeddings

Mesh-based motion fields evolve from decoded semantics

Interactive video coding without additional manipulation processes

🔎 Similar Papers

SMC++: Masked Learning of Unsupervised Video Semantic Compression