PhysicsArena: The First Multimodal Physics Reasoning Benchmark Exploring Variable, Process, and Solution Dimensions

📅 2025-05-21

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

Existing physical reasoning benchmarks primarily rely on text-only inputs or evaluate only final answers, neglecting critical intermediate steps—such as variable identification and process modeling—thus failing to comprehensively assess the physical reasoning capabilities of multimodal large language models (MLLMs). Method: We introduce PhysBench, the first multimodal physical reasoning benchmark tailored for MLLMs, featuring a novel “variable–process–solution” tripartite reasoning framework quantified via structured annotations. It integrates heterogeneous modalities (images, mathematical formulas, and text), leverages multimodal prompt engineering and physics-knowledge injection, and covers 12 classical physics scenarios with over 3,000 high-quality, multi-step reasoning samples. Contribution/Results: PhysBench enables fine-grained, interpretable evaluation of MLLMs’ physical reasoning, significantly improving assessment reliability and model discriminability. It bridges two critical gaps: the lack of multimodal physical reasoning benchmarks and the absence of process-oriented, stepwise evaluation protocols.

Technology Category

Application Category

📝 Abstract

Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities in diverse reasoning tasks, yet their application to complex physics reasoning remains underexplored. Physics reasoning presents unique challenges, requiring grounding in physical conditions and the interpretation of multimodal information. Current physics benchmarks are limited, often focusing on text-only inputs or solely on problem-solving, thereby overlooking the critical intermediate steps of variable identification and process formulation. To address these limitations, we introduce PhysicsArena, the first multimodal physics reasoning benchmark designed to holistically evaluate MLLMs across three critical dimensions: variable identification, physical process formulation, and solution derivation. PhysicsArena aims to provide a comprehensive platform for assessing and advancing the multimodal physics reasoning abilities of MLLMs.

Problem

Research questions and friction points this paper is trying to address.

Evaluate MLLMs in multimodal physics reasoning tasks

Address limitations of text-only physics benchmarks

Assess variable identification, process formulation, solution derivation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal benchmark for physics reasoning

Evaluates variable, process, solution dimensions

Comprehensive MLLM assessment platform

🔎 Similar Papers

OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI