Scaling Audio-Visual Quality Assessment Dataset via Crowdsourcing

📅 2026-02-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing audiovisual quality assessment datasets are limited in scale, lack diversity in content and quality degradation types, and provide only holistic scores, thereby hindering research on multimodal perception mechanisms. To address these limitations, this work proposes a crowdsourced subjective evaluation framework that transcends traditional laboratory constraints, integrating a systematic data sampling strategy with a multidimensional annotation scheme. This approach yields YT-NTU-AVQ, the largest and most diverse audiovisual quality assessment dataset to date, comprising 1,620 user-generated videos spanning a broad spectrum of semantic scenarios and quality levels. The dataset and associated platform code have been publicly released, significantly advancing the study and development of multimodal perceptual modeling.

Technology Category

Application Category

📝 Abstract
Audio-visual quality assessment (AVQA) research has been stalled by limitations of existing datasets: they are typically small in scale, with insufficient diversity in content and quality, and annotated only with overall scores. These shortcomings provide limited support for model development and multimodal perception research. We propose a practical approach for AVQA dataset construction. First, we design a crowdsourced subjective experiment framework for AVQA, breaks the constraints of in-lab settings and achieves reliable annotation across varied environments. Second, a systematic data preparation strategy is further employed to ensure broad coverage of both quality levels and semantic scenarios. Third, we extend the dataset with additional annotations, enabling research on multimodal perception mechanisms and their relation to content. Finally, we validate this approach through YT-NTU-AVQ, the largest and most diverse AVQA dataset to date, consisting of 1,620 user-generated audio and video (A/V) sequences. The dataset and platform code are available at https://github.com/renyu12/YT-NTU-AVQ
Problem

Research questions and friction points this paper is trying to address.

Audio-visual quality assessment
dataset limitations
multimodal perception
crowdsourcing
subjective evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

crowdsourcing
audio-visual quality assessment
multimodal perception
subjective evaluation
dataset scaling
🔎 Similar Papers
No similar papers found.