PointQ-Bench: Benchmarking Diagnostic and Interpretable Point Cloud Quality Assessment

📅 2026-05-27

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

Existing point cloud quality assessment methods typically predict only scalar scores, which are insufficient to support diagnostic needs such as defect identification, attribution, and interpretability. To address this limitation, this work proposes PointQ-Bench—the first benchmark specifically designed for diagnostic and interpretable point cloud quality evaluation—comprising 3,083 samples with multidimensional annotations. The benchmark introduces the SSFRQ-5D five-dimensional evaluation protocol, enabling four key tasks: anomaly-aware assessment, defect diagnosis, usability grading, and open-ended quality reporting. Leveraging real-world scans, simulated distortions, and AI-generated data, the study systematically evaluates 14 models, including both vision-language foundation models and traditional approaches. Experimental results reveal a significant performance gap between coarse-grained perception and fine-grained diagnosis, and demonstrate that powerful 2D multimodal large models consistently outperform specialized 3D architectures.

📝 Abstract

Point cloud quality plays a critical role in 3D acquisition, reconstruction, rendering, and perception, yet existing point cloud quality assessment (PCQA) research remains largely centered on scalar score prediction. In practical inspection scenarios, quality assessment often involves identifying defects, characterizing dominant issue types, assessing downstream usability, and providing evidence-supported descriptions, which are not explicitly evaluated by current benchmarks. We introduce PointQ-Bench, a benchmark designed to extend PCQA from scalar scoring toward comprehensive quality understanding. PointQ-Bench consists of 3,083 point clouds spanning authentic scans, simulated distortions, and AI-generated content, covering eight major issue types. Each sample is annotated with mean opinion scores (MOS), quality levels, issue tags, expert-grounded descriptions, and 12,332 question-answer pairs. The benchmark supports three perception-oriented tasks: anomaly sensing, defect diagnosis, and usability grading, as well as a cognition-oriented task of open-ended quality reporting. To evaluate free-form quality descriptions, we further propose SSFRQ-5D, a five-dimensional evaluation protocol validated through human-AI agreement analysis. Extensive experiments on 14 vision-language models and traditional PCQA baselines reveal a consistent perception-diagnosis gap: while current models exhibit emerging abilities in coarse defect perception, they struggle with grounded diagnosis and quality calibration. Strong 2D MLLMs generally outperform existing 3D VLMs, and the benefit of additional views or point-level inputs is non-uniform, varying across tasks, data sources, and models, particularly under boundary-ambiguous conditions. Overall, PointQ-Bench provides a diagnostic testbed for advancing reliable and interpretable point cloud quality understanding.

Problem

Research questions and friction points this paper is trying to address.

point cloud quality assessment

diagnostic evaluation

interpretable AI

quality understanding

anomaly diagnosis

Innovation

Methods, ideas, or system contributions that make the work stand out.

PointQ-Bench

point cloud quality assessment

interpretable diagnosis