Superstudent intelligence in thermodynamics

📅 2025-06-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the performance of large language models (LLMs) on high-difficulty engineering examinations requiring principled, first-principles reasoning—specifically, a university-level thermodynamics final exam. Employing a strict zero-shot, prompt-free evaluation protocol, all responses were manually graded by domain experts using the same rubric applied to over 10,000 student exams spanning 39 years. Results show that OpenAI’s latest reasoning-focused model, o3, achieved perfect scores across all problems and ranked among the top percentile of historical student performance—significantly outperforming the entire student cohort. This constitutes the first empirical demonstration that an LLM can systematically surpass human examinees on domain-specific, principle-driven engineering assessments without relying on pattern matching or memorization, but rather through deep physical reasoning. The finding represents a qualitative leap in machine intelligence’s capacity for logical inference in foundational engineering disciplines and carries profound implications for rethinking engineering pedagogy and the evolving role of engineers.

Technology Category

Application Category

📝 Abstract
In this short note, we report and analyze a striking event: OpenAI's large language model o3 has outwitted all students in a university exam on thermodynamics. The thermodynamics exam is a difficult hurdle for most students, where they must show that they have mastered the fundamentals of this important topic. Consequently, the failure rates are very high, A-grades are rare - and they are considered proof of the students' exceptional intellectual abilities. This is because pattern learning does not help in the exam. The problems can only be solved by knowledgeably and creatively combining principles of thermodynamics. We have given our latest thermodynamics exam not only to the students but also to OpenAI's most powerful reasoning model, o3, and have assessed the answers of o3 exactly the same way as those of the students. In zero-shot mode, the model o3 solved all problems correctly, better than all students who took the exam; its overall score was in the range of the best scores we have seen in more than 10,000 similar exams since 1985. This is a turning point: machines now excel in complex tasks, usually taken as proof of human intellectual capabilities. We discuss the consequences this has for the work of engineers and the education of future engineers.
Problem

Research questions and friction points this paper is trying to address.

AI outperforms students in thermodynamics exam
Machines excel in complex intellectual tasks
Implications for engineering education and practice
Innovation

Methods, ideas, or system contributions that make the work stand out.

Used OpenAI's o3 model for thermodynamics exam
Achieved top scores in zero-shot mode
Outperformed all human students historically
🔎 Similar Papers
No similar papers found.
R
Rebecca Loubet
Laboratory of Engineering Thermodynamics (LTD), RPTU Kaiserslautern, Germany
P
Pascal Zittlau
Laboratory of Engineering Thermodynamics (LTD), RPTU Kaiserslautern, Germany
M
Marco Hoffmann
Laboratory of Engineering Thermodynamics (LTD), RPTU Kaiserslautern, Germany
L
Luisa Vollmer
Visual Information Analysis Research Group (VIA), RPTU Kaiserslautern, Germany
S
Sophie Fellenz
Machine Learning Research Group (ML), RPTU Kaiserslautern, Germany
Heike Leitte
Heike Leitte
Professor of Computer Science, TU Kaiserslautern
VisualizationVisual AnalyticsData Science
Fabian Jirasek
Fabian Jirasek
Laboratory of Engineering Themodynamics (LTD), RPTU Kaiserslautern
Chemical EngineeringBioprocess EngineeringThermodynamicsMachine Learning
J
Johannes Lenhard
Laboratory of Engineering Thermodynamics (LTD), RPTU Kaiserslautern, Germany, Philosophy in Science and Engineering, RPTU Kaierslautern, Germany
Hans Hasse
Hans Hasse
University of Kaiserslautern
Chemical Engineering