VEHME: A Vision-Language Model For Evaluating Handwritten Mathematics Expressions

📅 2025-10-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Automated evaluation of handwritten mathematical solutions faces dual challenges in accuracy and interpretability due to diverse handwriting formats, disordered layouts, and complex symbolic structures. To address these, we propose an expression-aware vision-language modeling framework: (1) a visual prompting module for precise localization of mathematical symbols and structural elements; and (2) a two-stage training pipeline integrating supervised fine-tuning, reinforcement learning, and spatial attention mechanisms—trained on a synthetically generated multi-line mathematical expression dataset to enhance generalization. Our method enables fine-grained score alignment, quantifies reasoning depth, and localizes errors with interpretable granularity. Evaluated on the AIHub and FERMAT benchmarks, our approach achieves state-of-the-art performance among open-source models, with accuracy approaching that of commercial systems. It further offers scalability and full open accessibility.

Technology Category

Application Category

📝 Abstract
Automatically assessing handwritten mathematical solutions is an important problem in educational technology with practical applications, but it remains a significant challenge due to the diverse formats, unstructured layouts, and symbolic complexity of student work. To address this challenge, we introduce VEHME-a Vision-Language Model for Evaluating Handwritten Mathematics Expressions-designed to assess open-form handwritten math responses with high accuracy and interpretable reasoning traces. VEHME integrates a two-phase training pipeline: (i) supervised fine-tuning using structured reasoning data, and (ii) reinforcement learning that aligns model outputs with multi-dimensional grading objectives, including correctness, reasoning depth, and error localization. To enhance spatial understanding, we propose an Expression-Aware Visual Prompting Module, trained on our synthesized multi-line math expressions dataset to robustly guide attention in visually heterogeneous inputs. Evaluated on AIHub and FERMAT datasets, VEHME achieves state-of-the-art performance among open-source models and approaches the accuracy of proprietary systems, demonstrating its potential as a scalable and accessible tool for automated math assessment. Our training and experiment code is publicly available at our GitHub repository.
Problem

Research questions and friction points this paper is trying to address.

Automatically assessing handwritten mathematical solutions in education
Addressing diverse formats and symbolic complexity of student work
Providing interpretable grading with correctness and error localization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision-Language Model for handwritten math evaluation
Two-phase training with fine-tuning and reinforcement learning
Expression-Aware Visual Prompting for spatial understanding
🔎 Similar Papers
No similar papers found.