StructVRM: Aligning Multimodal Reasoning with Structured and Verifiable Reward Models

📅 2025-08-07

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

Existing vision-language models exhibit limited performance on multi-step complex reasoning tasks, primarily due to conventional reward mechanisms that provide only coarse-grained, binary global scores—lacking fine-grained, verifiable feedback on subproblem correctness. Method: We propose StructVRM, a structured and verifiable reward model that integrates semantic parsing with mathematical logical equivalence verification to assign subproblem-level, formally verifiable partial scores—overcoming the limitations of string matching and global scoring. Its model-driven, end-to-end reward modeling framework enables fine-grained optimization of reasoning paths in reinforcement learning. Contribution/Results: Evaluated on 12 public multimodal benchmarks, Seed-StructVRM achieves state-of-the-art performance on 6, and significantly outperforms prior methods on our newly constructed high-difficulty STEM-Bench. To our knowledge, this is the first work to realize structured, verifiable reward alignment for multimodal reasoning processes.

Technology Category

Application Category

📝 Abstract

Existing Vision-Language Models often struggle with complex, multi-question reasoning tasks where partial correctness is crucial for effective learning. Traditional reward mechanisms, which provide a single binary score for an entire response, are too coarse to guide models through intricate problems with multiple sub-parts. To address this, we introduce StructVRM, a method that aligns multimodal reasoning with Structured and Verifiable Reward Models. At its core is a model-based verifier trained to provide fine-grained, sub-question-level feedback, assessing semantic and mathematical equivalence rather than relying on rigid string matching. This allows for nuanced, partial credit scoring in previously intractable problem formats. Extensive experiments demonstrate the effectiveness of StructVRM. Our trained model, Seed-StructVRM, achieves state-of-the-art performance on six out of twelve public multimodal benchmarks and our newly curated, high-difficulty STEM-Bench. The success of StructVRM validates that training with structured, verifiable rewards is a highly effective approach for advancing the capabilities of multimodal models in complex, real-world reasoning domains.

Problem

Research questions and friction points this paper is trying to address.

Improving Vision-Language Models for complex multi-question reasoning tasks

Replacing coarse binary rewards with fine-grained sub-question feedback

Enhancing multimodal model performance in real-world STEM reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Structured and Verifiable Reward Models

Fine-grained sub-question-level feedback

Semantic and mathematical equivalence assessment

🔎 Similar Papers

Semantic Self-Consistency: Enhancing Language Model Reasoning via Semantic Weighting