SoM-1K: A Thousand-Problem Benchmark Dataset for Strength of Materials

📅 2025-09-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing foundation models lack rigorous evaluation on complex multimodal engineering reasoning—particularly in materials mechanics—due to the absence of domain-specific benchmarks. Method: We introduce SoM-1K, the first professional-scale benchmark comprising 1,065 annotated problems, each accompanied by textual descriptions and schematic diagrams. To mitigate visual misinterpretation, we propose Description-of-Image (DoI), a prompting strategy that replaces raw image inputs with expert-authored, structured textual descriptions. Our evaluation framework integrates multimodal data curation, knowledge-guided image-to-text conversion, and comparative assessment across LLMs and VLMs. Results: Current models achieve limited performance (max accuracy: 56.6%). Crucially, DoI-enhanced pure language models consistently outperform vision-dependent multimodal models, demonstrating that text-based representations yield greater robustness for engineering reasoning. This establishes a new paradigm for high-reliability engineering AI grounded in semantically precise, vision-agnostic reasoning.

Technology Category

Application Category

📝 Abstract
Foundation models have shown remarkable capabilities in various domains, but their performance on complex, multimodal engineering problems remains largely unexplored. We introduce SoM-1K, the first large-scale multimodal benchmark dataset dedicated to evaluating foundation models on problems in the strength of materials (SoM). The dataset, which contains 1,065 annotated SoM problems, mirrors real-world engineering tasks by including both textual problem statements and schematic diagrams. Due to the limited capabilities of current foundation models in understanding complicated visual information, we propose a novel prompting strategy called Descriptions of Images (DoI), which provides rigorous expert-generated text descriptions of the visual diagrams as the context. We evaluate eight representative foundation models, including both large language models (LLMs) and vision language models (VLMs). Our results show that current foundation models struggle significantly with these engineering problems, with the best-performing model achieving only 56.6% accuracy. Interestingly, we found that LLMs, when provided with DoI, often outperform VLMs provided with visual diagrams. A detailed error analysis reveals that DoI plays a crucial role in mitigating visual misinterpretation errors, suggesting that accurate text-based descriptions can be more effective than direct image input for current foundation models. This work establishes a rigorous benchmark for engineering AI and highlights a critical need for developing more robust multimodal reasoning capabilities in foundation models, particularly in scientific and engineering contexts.
Problem

Research questions and friction points this paper is trying to address.

Evaluating foundation models on complex multimodal engineering problems
Assessing AI performance on strength of materials problems with diagrams
Addressing visual misinterpretation errors in engineering AI applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Created SoM-1K benchmark dataset for materials problems
Introduced Descriptions of Images prompting strategy
Used text descriptions to outperform direct image input
Q
Qixin Wan
College of Civil Engineering, Hunan University, Changsha, 410082, China
Z
Zilong Wang
College of Civil Engineering, Hunan University, Changsha, 410082, China
J
Jingwen Zhou
College of Civil Engineering, Hunan University, Changsha, 410082, China
Wanting Wang
Wanting Wang
London School of Economics
social psychologymachine learningneural networks
Z
Ziheng Geng
Department of Civil & Architectural Engineering, University of Miami, Coral Gables, FL 33146, USA
J
Jiachen Liu
Department of Electrical and Computer Engineering, University of Miami, Coral Gables, FL 33146, USA
R
Ran Cao
College of Civil Engineering, Hunan University, Changsha, 410082, China
M
Minghui Cheng
Department of Civil & Architectural Engineering, University of Miami, Coral Gables, FL 33146, USA; School of Architecture, University of Miami, Coral Gables, FL 33146, USA
Lu Cheng
Lu Cheng
Assistant Professor, UIC CS
Socially Responsible AICausal Machine LearningData MiningAI for Good