GOBench: Benchmarking Geometric Optics Generation and Understanding of MLLMs

📅 2025-06-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current multimodal large language models (MLLMs) lack rigorous evaluation of their understanding and generation capabilities regarding fine-grained geometric optics principles. Method: We introduce GOBench—the first benchmark dedicated to geometric optics—comprising two core tasks: optical image generation and phenomenon understanding. We propose novel evaluation dimensions, including optical fidelity and instruction adherence, release the GOBench-Gen-1k synthetic generation dataset, and establish a standardized understanding evaluation protocol. Our methodology integrates high-quality scene-aware prompting, human subjective assessment, domain-specific evaluation instructions, and comparative testing across 11 state-of-the-art MLLMs. Results: Experiments reveal pervasive principle-level errors: GPT-4o-Image fails to achieve optical fidelity in generation, and Gemini-2.5-Pro attains only 37.35% accuracy on understanding tasks—demonstrating severe capability gaps. This work establishes the first physics-grounded, multimodal evaluation framework for geometric optics, addressing a critical void in the field.

Technology Category

Application Category

📝 Abstract
The rapid evolution of Multi-modality Large Language Models (MLLMs) is driving significant advancements in visual understanding and generation. Nevertheless, a comprehensive assessment of their capabilities, concerning the fine-grained physical principles especially in geometric optics, remains underexplored. To address this gap, we introduce GOBench, the first benchmark to systematically evaluate MLLMs' ability across two tasks: 1) Generating Optically Authentic Imagery and 2) Understanding Underlying Optical Phenomena. We curates high-quality prompts of geometric optical scenarios and use MLLMs to construct GOBench-Gen-1k dataset.We then organize subjective experiments to assess the generated imagery based on Optical Authenticity, Aesthetic Quality, and Instruction Fidelity, revealing MLLMs' generation flaws that violate optical principles. For the understanding task, we apply crafted evaluation instructions to test optical understanding ability of eleven prominent MLLMs. The experimental results demonstrate that current models face significant challenges in both optical generation and understanding. The top-performing generative model, GPT-4o-Image, cannot perfectly complete all generation tasks, and the best-performing MLLM model, Gemini-2.5Pro, attains a mere 37.35% accuracy in optical understanding.
Problem

Research questions and friction points this paper is trying to address.

Assessing MLLMs' geometric optics generation and understanding
Evaluating optical authenticity and aesthetic quality of generated imagery
Testing MLLMs' accuracy in understanding underlying optical phenomena
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces GOBench for MLLMs evaluation
Assesses optical imagery generation authenticity
Tests MLLMs understanding of optics
🔎 Similar Papers
No similar papers found.
X
Xiaorong Zhu
Shanghai Jiao Tong University
Ziheng Jia
Ziheng Jia
Shanghai Jiaotong University / Shanghai AILab
LLM and LMM on Visual Quality Assessment
J
Jiarui Wang
Shanghai Jiao Tong University
X
Xiangyu Zhao
Shanghai Jiao Tong University, Shanghai AI Laboratory
Haodong Duan
Haodong Duan
Shanghai AI Lab | CUHK | PKU
Computer VisionVideo UnderstandingMultimodal LearningGenerative AI
X
Xiongkuo Min
Shanghai Jiao Tong University
J
Jia Wang
Shanghai Jiao Tong University
Z
Zicheng Zhang
Shanghai Jiao Tong University, Shanghai AI Laboratory
Guangtao Zhai
Guangtao Zhai
Professor, IEEE Fellow, Shanghai Jiao Tong University
Multimedia Signal ProcessingVisual Quality AssessmentQoEAI EvaluationDisplays