🤖 AI Summary
This study addresses the low-resource Vietnamese legal domain by introducing VietTrafficLaw, the first multimodal legal question answering (MLQA) benchmark focused on traffic sign regulations. Problematically, existing legal AI systems lack support for multimodal legal reasoning in low-resource languages. Methodologically, we propose a text–image joint modeling framework that enables cross-modal semantic alignment and information fusion, supporting both multimodal legal retrieval and question answering. Our primary contribution is the release of the first publicly available Vietnamese multimodal legal dataset—comprising traffic sign images, statutory text excerpts, and expert-annotated question-answer pairs—alongside standardized evaluation protocols. Experimental results demonstrate that the best-performing model achieves an F2 score of 64.55% on retrieval and 86.30% accuracy on QA, substantially advancing multimodal legal intelligence and systematic evaluation frameworks for Vietnamese and other low-resource languages.
📝 Abstract
This paper presents the VLSP 2025 MLQA-TSR - the multimodal legal question answering on traffic sign regulation shared task at VLSP 2025. VLSP 2025 MLQA-TSR comprises two subtasks: multimodal legal retrieval and multimodal question answering. The goal is to advance research on Vietnamese multimodal legal text processing and to provide a benchmark dataset for building and evaluating intelligent systems in multimodal legal domains, with a focus on traffic sign regulation in Vietnam. The best-reported results on VLSP 2025 MLQA-TSR are an F2 score of 64.55% for multimodal legal retrieval and an accuracy of 86.30% for multimodal question answering.