Sparse2Dense: A Keypoint-driven Generative Framework for Human Video Compression and Vertex Prediction

📅 2025-09-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of simultaneously achieving ultra-low-bitrate compression and high-accuracy vertex prediction for human motion videos under bandwidth constraints, this paper proposes a sparse 3D keypoint-based multi-task generative framework. Methodologically, it encodes global motion using a minimal set of semantically meaningful 3D keypoints (e.g., 18), integrates temporal coherence modeling, keypoint-aware deep generative networks, and dense motion field estimation, and jointly optimizes both video reconstruction and vertex prediction objectives to ensure synergistic improvement in visual fidelity and geometric consistency. Experimental results demonstrate that the proposed method significantly outperforms conventional codecs (H.264/H.265) and state-of-the-art generative approaches at equivalent bitrates, while achieving sub-millimeter vertex prediction accuracy. The framework is particularly suitable for resource-constrained applications such as real-time motion analysis and lightweight virtual human animation.

Technology Category

Application Category

📝 Abstract
For bandwidth-constrained multimedia applications, simultaneously achieving ultra-low bitrate human video compression and accurate vertex prediction remains a critical challenge, as it demands the harmonization of dynamic motion modeling, detailed appearance synthesis, and geometric consistency. To address this challenge, we propose Sparse2Dense, a keypoint-driven generative framework that leverages extremely sparse 3D keypoints as compact transmitted symbols to enable ultra-low bitrate human video compression and precise human vertex prediction. The key innovation is the multi-task learning-based and keypoint-aware deep generative model, which could encode complex human motion via compact 3D keypoints and leverage these sparse keypoints to estimate dense motion for video synthesis with temporal coherence and realistic textures. Additionally, a vertex predictor is integrated to learn human vertex geometry through joint optimization with video generation, ensuring alignment between visual content and geometric structure. Extensive experiments demonstrate that the proposed Sparse2Dense framework achieves competitive compression performance for human video over traditional/generative video codecs, whilst enabling precise human vertex prediction for downstream geometry applications. As such, Sparse2Dense is expected to facilitate bandwidth-efficient human-centric media transmission, such as real-time motion analysis, virtual human animation, and immersive entertainment.
Problem

Research questions and friction points this paper is trying to address.

Achieving ultra-low bitrate human video compression and vertex prediction
Harmonizing dynamic motion modeling with appearance synthesis and geometric consistency
Enabling bandwidth-efficient human-centric media transmission through sparse keypoints
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses sparse 3D keypoints for ultra-low bitrate compression
Generates dense motion and textures from keypoints
Integrates vertex prediction with video generation
🔎 Similar Papers
No similar papers found.
B
Bolin Chen
Fudan University
R
Ru-Ling Liao
DAMO Academy, Alibaba Group
Yan Ye
Yan Ye
Alibaba Inc
video coding
J
Jie Chen
DAMO Academy, Alibaba Group
S
Shanzhi Yin
City University of Hong Kong
X
Xinrui Ju
City University of Hong Kong
S
Shiqi Wang
City University of Hong Kong
Yibo Fan
Yibo Fan
Professor, Fudan University
Video CodingImage ProcessingProcessorVLSI Design