Sparse2Dense: A Keypoint-driven Generative Framework for Human Video Compression and Vertex Prediction

📅 2025-09-27

📈 Citations: 0

✨ Influential: 0

career value

242K/year

🤖 AI Summary

To address the challenge of simultaneously achieving ultra-low-bitrate compression and high-accuracy vertex prediction for human motion videos under bandwidth constraints, this paper proposes a sparse 3D keypoint-based multi-task generative framework. Methodologically, it encodes global motion using a minimal set of semantically meaningful 3D keypoints (e.g., 18), integrates temporal coherence modeling, keypoint-aware deep generative networks, and dense motion field estimation, and jointly optimizes both video reconstruction and vertex prediction objectives to ensure synergistic improvement in visual fidelity and geometric consistency. Experimental results demonstrate that the proposed method significantly outperforms conventional codecs (H.264/H.265) and state-of-the-art generative approaches at equivalent bitrates, while achieving sub-millimeter vertex prediction accuracy. The framework is particularly suitable for resource-constrained applications such as real-time motion analysis and lightweight virtual human animation.

Technology Category

Application Category

📝 Abstract

For bandwidth-constrained multimedia applications, simultaneously achieving ultra-low bitrate human video compression and accurate vertex prediction remains a critical challenge, as it demands the harmonization of dynamic motion modeling, detailed appearance synthesis, and geometric consistency. To address this challenge, we propose Sparse2Dense, a keypoint-driven generative framework that leverages extremely sparse 3D keypoints as compact transmitted symbols to enable ultra-low bitrate human video compression and precise human vertex prediction. The key innovation is the multi-task learning-based and keypoint-aware deep generative model, which could encode complex human motion via compact 3D keypoints and leverage these sparse keypoints to estimate dense motion for video synthesis with temporal coherence and realistic textures. Additionally, a vertex predictor is integrated to learn human vertex geometry through joint optimization with video generation, ensuring alignment between visual content and geometric structure. Extensive experiments demonstrate that the proposed Sparse2Dense framework achieves competitive compression performance for human video over traditional/generative video codecs, whilst enabling precise human vertex prediction for downstream geometry applications. As such, Sparse2Dense is expected to facilitate bandwidth-efficient human-centric media transmission, such as real-time motion analysis, virtual human animation, and immersive entertainment.

Problem

Research questions and friction points this paper is trying to address.

Achieving ultra-low bitrate human video compression and vertex prediction

Harmonizing dynamic motion modeling with appearance synthesis and geometric consistency

Enabling bandwidth-efficient human-centric media transmission through sparse keypoints

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses sparse 3D keypoints for ultra-low bitrate compression

Generates dense motion and textures from keypoints

Integrates vertex prediction with video generation

🔎 Similar Papers

No similar papers found.