Generating Pedagogically Meaningful Visuals for Math Word Problems: A New Benchmark and Analysis of Text-to-Image Models

📅 2025-06-04

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

Automated generation of pedagogically effective visual aids for mathematical word problems (MWPs) remains underexplored in mathematics education. Method: We introduce the first visualization-generation framework tailored for math education, grounded in teacher interviews to define an education-oriented visual language and design space, along with interpretable, evaluable visual representation criteria. We construct Math2Visual—a benchmark dataset comprising 1,903 image-text pairs—and fine-tune text-to-image (TTI) models (e.g., Stable Diffusion), complemented by a multidimensional human evaluation protocol. Contribution/Results: Fine-tuned models achieve significantly higher accuracy in representing mathematical relationships. Our analysis reveals, for the first time, systematic failures of existing TTI models in capturing quantitative relations and operational logic—critical dimensions for MWP understanding. We establish a novel evaluation standard centered on instructional effectiveness, providing both methodological foundations and empirical evidence for explainable, educationally grounded visual generation in AI-enhanced learning.

Technology Category

Application Category

📝 Abstract

Visuals are valuable tools for teaching math word problems (MWPs), helping young learners interpret textual descriptions into mathematical expressions before solving them. However, creating such visuals is labor-intensive and there is a lack of automated methods to support this process. In this paper, we present Math2Visual, an automatic framework for generating pedagogically meaningful visuals from MWP text descriptions. Math2Visual leverages a pre-defined visual language and a design space grounded in interviews with math teachers, to illustrate the core mathematical relationships in MWPs. Using Math2Visual, we construct an annotated dataset of 1,903 visuals and evaluate Text-to-Image (TTI) models for their ability to generate visuals that align with our design. We further fine-tune several TTI models with our dataset, demonstrating improvements in educational visual generation. Our work establishes a new benchmark for automated generation of pedagogically meaningful visuals and offers insights into key challenges in producing multimodal educational content, such as the misrepresentation of mathematical relationships and the omission of essential visual elements.

Problem

Research questions and friction points this paper is trying to address.

Automating creation of educational visuals for math problems

Lack of benchmarks for pedagogically meaningful visual generation

Improving text-to-image models for accurate math representation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated framework for pedagogically meaningful visuals

Pre-defined visual language from teacher interviews

Fine-tuned TTI models for educational visuals

🔎 Similar Papers

MathScape: Evaluating MLLMs in multimodal Math Scenarios through a Hierarchical Benchmark