DHQA-4D: Perceptual Quality Assessment of Dynamic 4D Digital Human

📅 2025-10-04

📈 Citations: 0

✨ Influential: 0

career value

227K/year

🤖 AI Summary

Dynamic 4D digital humans are highly susceptible to noise during acquisition, compression, and transmission, degrading perceptual quality and necessitating dedicated objective quality assessment methods. To address this, we introduce DHQA-4D—the first large-scale, human-centered 4D quality assessment dataset—featuring diverse distortions and both textured and textureless 4D human meshes. We further propose DynaMesh-Rater, a novel multimodal quality assessment framework that enables unified quality prediction for 4D human meshes: it extracts visual features from multi-view 2D videos, models temporal dynamics via motion segments, encodes geometric and topological structure, and fuses these heterogeneous representations using a large multimodal model (LMM), fine-tuned end-to-end via LoRA-based instruction tuning. Extensive experiments on DHQA-4D demonstrate that DynaMesh-Rater significantly outperforms existing metrics in accuracy and cross-distortion generalization, establishing a new paradigm for objective 4D digital human quality evaluation.

Technology Category

Application Category

📝 Abstract

With the rapid development of 3D scanning and reconstruction technologies, dynamic digital human avatars based on 4D meshes have become increasingly popular. A high-precision dynamic digital human avatar can be applied to various fields such as game production, animation generation, and remote immersive communication. However, these 4D human avatar meshes are prone to being degraded by various types of noise during the processes of collection, compression, and transmission, thereby affecting the viewing experience of users. In light of this fact, quality assessment of dynamic 4D digital humans becomes increasingly important. In this paper, we first propose a large-scale dynamic digital human quality assessment dataset, DHQA-4D, which contains 32 high-quality real-scanned 4D human mesh sequences, 1920 distorted textured 4D human meshes degraded by 11 textured distortions, as well as their corresponding textured and non-textured mean opinion scores (MOSs). Equipped with DHQA-4D dataset, we analyze the influence of different types of distortion on human perception for textured dynamic 4D meshes and non-textured dynamic 4D meshes. Additionally, we propose DynaMesh-Rater, a novel large multimodal model (LMM) based approach that is able to assess both textured 4D meshes and non-textured 4D meshes. Concretely, DynaMesh-Rater elaborately extracts multi-dimensional features, including visual features from a projected 2D video, motion features from cropped video clips, and geometry features from the 4D human mesh to provide comprehensive quality-related information. Then we utilize a LMM model to integrate the multi-dimensional features and conduct a LoRA-based instruction tuning technique to teach the LMM model to predict the quality scores. Extensive experimental results on the DHQA-4D dataset demonstrate the superiority of our DynaMesh-Rater method over previous quality assessment methods.

Problem

Research questions and friction points this paper is trying to address.

Assessing perceptual quality degradation in dynamic 4D human avatars

Evaluating impact of various distortions on textured and non-textured 4D meshes

Developing quality assessment methods for compressed 4D human mesh sequences

Innovation

Methods, ideas, or system contributions that make the work stand out.

Creates large-scale dataset with distorted 4D human meshes

Proposes multimodal model extracting visual motion geometry features

Uses LoRA instruction tuning to predict quality scores

🔎 Similar Papers

Aligning Human Motion Generation with Human Perceptions

2024-07-02arXiv.orgCitations: 0

Nvidia

The base salary range is 152,000 USD - 241,500 USD for Level 3, and 184,000 USD - 287,500 USD for Level 4.

US, CA, Santa Clara / Remote - US

Research Scientist Intern, Multimodal AI (PhD)