Scaling Audio-Visual Quality Assessment Dataset via Crowdsourcing

📅 2026-02-26

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

Existing audiovisual quality assessment datasets are limited in scale, lack diversity in content and quality degradation types, and provide only holistic scores, thereby hindering research on multimodal perception mechanisms. To address these limitations, this work proposes a crowdsourced subjective evaluation framework that transcends traditional laboratory constraints, integrating a systematic data sampling strategy with a multidimensional annotation scheme. This approach yields YT-NTU-AVQ, the largest and most diverse audiovisual quality assessment dataset to date, comprising 1,620 user-generated videos spanning a broad spectrum of semantic scenarios and quality levels. The dataset and associated platform code have been publicly released, significantly advancing the study and development of multimodal perceptual modeling.

Technology Category

Application Category

📝 Abstract

Audio-visual quality assessment (AVQA) research has been stalled by limitations of existing datasets: they are typically small in scale, with insufficient diversity in content and quality, and annotated only with overall scores. These shortcomings provide limited support for model development and multimodal perception research. We propose a practical approach for AVQA dataset construction. First, we design a crowdsourced subjective experiment framework for AVQA, breaks the constraints of in-lab settings and achieves reliable annotation across varied environments. Second, a systematic data preparation strategy is further employed to ensure broad coverage of both quality levels and semantic scenarios. Third, we extend the dataset with additional annotations, enabling research on multimodal perception mechanisms and their relation to content. Finally, we validate this approach through YT-NTU-AVQ, the largest and most diverse AVQA dataset to date, consisting of 1,620 user-generated audio and video (A/V) sequences. The dataset and platform code are available at https://github.com/renyu12/YT-NTU-AVQ

Problem

Research questions and friction points this paper is trying to address.

Audio-visual quality assessment

dataset limitations

multimodal perception

crowdsourcing

subjective evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

crowdsourcing

audio-visual quality assessment

multimodal perception

subjective evaluation

dataset scaling

🔎 Similar Papers

No similar papers found.

Apple

Cupertino, United States of America

Research Scientist Intern, Multimodal AI (PhD)