Solving Physics Olympiad via Reinforcement Learning on Physics Simulators

📅 2026-04-13

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

This work addresses the challenge that current large language models struggle with physical reasoning due to the scarcity of large-scale, real-world physics question-answering datasets. The authors propose a novel approach that leverages physics simulators to generate randomized scenarios and interactions, thereby constructing synthetic question-answer pairs for model training via reinforcement learning. This method pioneers the use of physics simulators as a scalable source of supervision, eliminating reliance on internet-sourced question-answer data. Remarkably, models trained exclusively on this synthetic data achieve a 5–10 percentage point performance gain on unseen, real International Physics Olympiad (IPhO) problems, demonstrating strong zero-shot transfer capabilities.

Technology Category

Application Category

📝 Abstract

We have witnessed remarkable advances in LLM reasoning capabilities with the advent of DeepSeek-R1. However, much of this progress has been fueled by the abundance of internet question-answer (QA) pairs, a major bottleneck going forward, since such data is limited in scale and concentrated mainly in domains like mathematics. In contrast, other sciences such as physics lack large-scale QA datasets to effectively train reasoning-capable models. In this work, we show that physics simulators can serve as a powerful alternative source of supervision for training LLMs for physical reasoning. We generate random scenes in physics engines, create synthetic question-answer pairs from simulated interactions, and train LLMs using reinforcement learning on this synthetic data. Our models exhibit zero-shot sim-to-real transfer to real-world physics benchmarks: for example, training solely on synthetic simulated data improves performance on IPhO (International Physics Olympiad) problems by 5-10 percentage points across model sizes. These results demonstrate that physics simulators can act as scalable data generators, enabling LLMs to acquire deep physical reasoning skills beyond the limitations of internet-scale QA data. Code available at: https://sim2reason.github.io/.

Problem

Research questions and friction points this paper is trying to address.

physics reasoning

large language models

QA datasets

data scarcity

physical sciences

Innovation

Methods, ideas, or system contributions that make the work stand out.

physics simulation

reinforcement learning

synthetic data generation