🤖 AI Summary
This work proposes ViSA-R2, a novel framework designed to recover analytical solutions from 2D steady-state physical field visualizations and their derivatives, thereby enabling AI-driven scientific reasoning. The method establishes an end-to-end pipeline that emulates the physicist’s reasoning process—structured as structure identification, analytical hypothesis formulation, parameter derivation, and consistency verification—and introduces a self-validating, solution-centered chain-of-thought mechanism. Built upon Qwen3-VL (8B) and SymPy, the study also releases ViSA-Bench, the first benchmark supporting vision-language models for symbolic reasoning in physics, evaluated through multi-dimensional metrics including numerical accuracy, structural similarity, and character-level precision. Experiments demonstrate that ViSA-R2 significantly outperforms both open- and closed-source vision-language models across 30 linear steady-state scenarios, achieving high-fidelity symbolic expression inference.
📝 Abstract
Recovering analytical solutions of physical fields from visual observations is a fundamental yet underexplored capability for AI-assisted scientific reasoning. We study visual-to-symbolic analytical solution inference (ViSA) for two-dimensional linear steady-state fields: given field visualizations (and first-order derivatives) plus minimal auxiliary metadata, the model must output a single executable SymPy expression with fully instantiated numeric constants. We introduce ViSA-R2 and align it with a self-verifying, solution-centric chain-of-thought pipeline that follows a physicist-like pathway: structural pattern recognition solution-family (ansatz) hypothesis parameter derivation consistency verification. We also release ViSA-Bench, a VLM-ready synthetic benchmark covering 30 linear steady-state scenarios with verifiable analytical/symbolic annotations, and evaluate predictions by numerical accuracy, expression-structure similarity, and character-level accuracy. Using an 8B open-weight Qwen3-VL backbone, ViSA-R2 outperforms strong open-source baselines and the evaluated closed-source frontier VLMs under a standardized protocol.