R4-CGQA: Retrieval-based Vision Language Models for Computer Graphics Image Quality Assessment

📅 2026-03-11

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

This study addresses the lack of systematic perceptual dimensions and interpretability in existing computer-generated (CG) image quality assessment methods. To bridge this gap, the authors define six perceptual dimensions from a user-centric perspective, construct the first dataset comprising 3,500 CG images annotated with fine-grained quality descriptions, and establish a corresponding visual question-answering benchmark. They propose a dual-stream retrieval-augmented generation framework that enhances the discriminative capability of vision-language models by retrieving semantically similar images to inform quality judgments. Experimental results demonstrate that the proposed approach significantly improves both accuracy and interpretability of CG image quality evaluation across multiple state-of-the-art vision-language models.

Technology Category

Application Category

📝 Abstract

Immersive Computer Graphics (CGs) rendering has become ubiquitous in modern daily life. However, comprehensively evaluating CG quality remains challenging for two reasons: First, existing CG datasets lack systematic descriptions of rendering quality; and second existing CG quality assessment methods cannot provide reasonable text-based explanations. To address these issues, we first identify six key perceptual dimensions of CG quality from the user perspective and construct a dataset of 3500 CG images with corresponding quality descriptions. Each description covers CG style, content, and perceived quality along the selected dimensions. Furthermore, we use a subset of the dataset to build several question-answer benchmarks based on the descriptions in order to evaluate the responses of existing Vision Language Models (VLMs). We find that current VLMs are not sufficiently accurate in judging fine-grained CG quality, but that descriptions of visually similar images can significantly improve a VLM's understanding of a given CG image. Motivated by this observation, we adopt retrieval-augmented generation and propose a two-stream retrieval framework that effectively enhances the CG quality assessment capabilities of VLMs. Experiments on several representative VLMs demonstrate that our method substantially improves their performance on CG quality assessment.

Problem

Research questions and friction points this paper is trying to address.

Computer Graphics

Image Quality Assessment

Vision Language Models

Rendering Quality

Text-based Explanation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Retrieval-augmented generation

Vision Language Models

Computer Graphics Quality Assessment