PhysicsArena: The First Multimodal Physics Reasoning Benchmark Exploring Variable, Process, and Solution Dimensions

📅 2025-05-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing physical reasoning benchmarks primarily rely on text-only inputs or evaluate only final answers, neglecting critical intermediate steps—such as variable identification and process modeling—thus failing to comprehensively assess the physical reasoning capabilities of multimodal large language models (MLLMs). Method: We introduce PhysBench, the first multimodal physical reasoning benchmark tailored for MLLMs, featuring a novel “variable–process–solution” tripartite reasoning framework quantified via structured annotations. It integrates heterogeneous modalities (images, mathematical formulas, and text), leverages multimodal prompt engineering and physics-knowledge injection, and covers 12 classical physics scenarios with over 3,000 high-quality, multi-step reasoning samples. Contribution/Results: PhysBench enables fine-grained, interpretable evaluation of MLLMs’ physical reasoning, significantly improving assessment reliability and model discriminability. It bridges two critical gaps: the lack of multimodal physical reasoning benchmarks and the absence of process-oriented, stepwise evaluation protocols.

Technology Category

Application Category

📝 Abstract
Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities in diverse reasoning tasks, yet their application to complex physics reasoning remains underexplored. Physics reasoning presents unique challenges, requiring grounding in physical conditions and the interpretation of multimodal information. Current physics benchmarks are limited, often focusing on text-only inputs or solely on problem-solving, thereby overlooking the critical intermediate steps of variable identification and process formulation. To address these limitations, we introduce PhysicsArena, the first multimodal physics reasoning benchmark designed to holistically evaluate MLLMs across three critical dimensions: variable identification, physical process formulation, and solution derivation. PhysicsArena aims to provide a comprehensive platform for assessing and advancing the multimodal physics reasoning abilities of MLLMs.
Problem

Research questions and friction points this paper is trying to address.

Evaluate MLLMs in multimodal physics reasoning tasks
Address limitations of text-only physics benchmarks
Assess variable identification, process formulation, solution derivation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal benchmark for physics reasoning
Evaluates variable, process, solution dimensions
Comprehensive MLLM assessment platform
🔎 Similar Papers
No similar papers found.
S
Song Dai
The Hong Kong University of Science and Technology (Guangzhou), Beijing Future Brain Education Technology Co., Ltd., The Hong Kong University of Science and Technology
Yibo Yan
Yibo Yan
East China Normal University
High-dimensional Statistics
Jiamin Su
Jiamin Su
The Hong Kong University of Science and Technology (Guangzhou)
Multimodal LLM
D
Dongfang Zihao
The Hong Kong University of Science and Technology (Guangzhou)
Y
Yubo Gao
The Hong Kong University of Science and Technology (Guangzhou)
Y
Yonghua Hei
The Hong Kong University of Science and Technology (Guangzhou), Beijing Future Brain Education Technology Co., Ltd., The Hong Kong University of Science and Technology
J
Jungang Li
The Hong Kong University of Science and Technology (Guangzhou), Beijing Future Brain Education Technology Co., Ltd.
Junyan Zhang
Junyan Zhang
National University of Singapore
Large Language Model
S
Sicheng Tao
The Hong Kong University of Science and Technology (Guangzhou)
Z
Zhuoran Gao
The Hong Kong University of Science and Technology (Guangzhou), Beijing Future Brain Education Technology Co., Ltd., The Hong Kong University of Science and Technology
Xuming Hu
Xuming Hu
Assistant Professor, HKUST(GZ) / HKUST
Natural Language ProcessingLarge Language Model