MathGen: Revealing the Illusion of Mathematical Competence through Text-to-Image Generation

📅 2026-03-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates whether current text-to-image generation models can preserve mathematical correctness when producing visual representations—such as diagrams or geometric constructions—required to solve mathematical problems. To this end, the authors construct a benchmark of 900 tasks spanning seven core mathematical domains and introduce, for the first time, an automated evaluation framework tailored to mathematical visual generation. This framework combines executable verifiers with a Script-as-a-Judge protocol to enable objective assessment. Experimental results reveal that even the best closed-source model achieves only 42.0% overall accuracy, while open-source models generally score below 11%, approaching 0% on structured tasks. These findings demonstrate that existing text-to-image models fundamentally lack the capability to generate mathematically valid visual content.
📝 Abstract
Modern generative models have demonstrated the ability to solve challenging mathematical problems. In many real-world settings, however, mathematical solutions must be expressed visually through diagrams, plots, geometric constructions, and structured symbolic layouts, where correctness depends on precise visual composition. Can generative models still do so when the answer must be rendered visually rather than written in text? To study this problem, we introduce MathGen, a rigorous benchmark of 900 problems spanning seven core domains, each paired with an executable verifier under a Script-as-a-Judge protocol for deterministic and objective evaluation. Experiments on representative open-source and proprietary text-to-image models show that mathematical fidelity remains a major bottleneck: even the best closed-source model reaches only 42.0% overall accuracy, while open-source models achieve just ~ 1-11%, often near 0% on structured tasks. Overall, current T2I models remain far from competent at even elementary mathematical visual generation.
Problem

Research questions and friction points this paper is trying to address.

text-to-image generation
mathematical competence
visual reasoning
generative models
mathematical fidelity
Innovation

Methods, ideas, or system contributions that make the work stand out.

text-to-image generation
mathematical visual reasoning
executable verification
Script-as-a-Judge
generative model evaluation
🔎 Similar Papers
No similar papers found.
R
Ruiyao Liu
University of Pennsylvania
Hui Shen
Hui Shen
University of Michigan, Ph.D. Student in Computer Science (2025.9-?)
Efficient AIGenerative ModelMachine Learning System
Ping Zhang
Ping Zhang
The Ohio State University
Data MiningDeep LearningCausal AIMultimodal LLMAI in Medicine
Y
Yunta Hsieh
University of Michigan
Y
Yifan Zhang
University of Michigan
J
Jing Xu
USTC
S
Sicheng Chen
The Ohio State University
J
Junchen Li
City University of Hong Kong
J
Jiawei Lu
University of Wisconsin
J
Jianing Ma
Independent
J
Jiaqi Mo
University of Wisconsin
Q
Qi Han
Independent
Zhen Zhang
Zhen Zhang
UCSB
NLP
Zhongwei Wan
Zhongwei Wan
The Ohio State University, PhD student
LLMMultimodalNLP
Jing Xiong
Jing Xiong
The University of Hong Kong
Natural Language ProcessingAutomated Theorem Proving
Xin Wang
Xin Wang
The Ohio State University
Efficient AI
Ziyuan Liu
Ziyuan Liu
Unknown affiliation
RoboticsManipulation and GraspingComputer VisionMachine Learning
H
Hangrui Cao
Carnegie Mellon University
N
Ngai Wong
University of Hong Kong