CLIP-PCQA: Exploring Subjective-Aligned Vision-Language Modeling for Point Cloud Quality Assessment

📅 2025-01-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the misalignment between traditional mean opinion score (MOS) regression and human subjective perception in no-reference point cloud quality assessment (NR-PCQA). To bridge this gap, we propose a novel semantic alignment–driven paradigm. Methodologically, we introduce vision-language alignment into PCQA for the first time: leveraging a CLIP-based architecture to achieve cross-modal feature alignment between point clouds and semantic text; designing learnable textual prompts to enhance semantic expressiveness; retrieving discrete quality descriptors (e.g., “excellent”, “poor”) via cosine similarity; and replacing scalar MOS with opinion score distribution (OSD), modeled probabilistically using contrastive loss. Extensive experiments on mainstream benchmarks demonstrate significant improvements over state-of-the-art methods, validating that semantic alignment effectively enhances both subjective consistency and robustness in NR-PCQA.

Technology Category

Application Category

📝 Abstract
In recent years, No-Reference Point Cloud Quality Assessment (NR-PCQA) research has achieved significant progress. However, existing methods mostly seek a direct mapping function from visual data to the Mean Opinion Score (MOS), which is contradictory to the mechanism of practical subjective evaluation. To address this, we propose a novel language-driven PCQA method named CLIP-PCQA. Considering that human beings prefer to describe visual quality using discrete quality descriptions (e.g.,"excellent"and"poor") rather than specific scores, we adopt a retrieval-based mapping strategy to simulate the process of subjective assessment. More specifically, based on the philosophy of CLIP, we calculate the cosine similarity between the visual features and multiple textual features corresponding to different quality descriptions, in which process an effective contrastive loss and learnable prompts are introduced to enhance the feature extraction. Meanwhile, given the personal limitations and bias in subjective experiments, we further covert the feature similarities into probabilities and consider the Opinion Score Distribution (OSD) rather than a single MOS as the final target. Experimental results show that our CLIP-PCQA outperforms other State-Of-The-Art (SOTA) approaches.
Problem

Research questions and friction points this paper is trying to address.

Point Cloud Quality Assessment
Human Perception Alignment
Relevance-based Models
Innovation

Methods, ideas, or system contributions that make the work stand out.

CLIP-PCQA
Feature Similarity
Probabilistic Quality Assessment
🔎 Similar Papers
No similar papers found.