ENC-Bench: A Benchmark for Evaluating Multimodal Large Language Models in Electronic Navigational Chart Understanding

πŸ“… 2026-03-23
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the unclear capability of multimodal large language models (MLLMs) in understanding electronic navigational charts (ENCs)β€”specialized maritime data comprising standardized vector symbols, scale-dependent rendering, and precise geometric structures. To this end, we introduce ENC-Bench, the first benchmark for ENC comprehension, constructed from 840 real NOAA ENC charts. Through a calibrated vector-to-image rendering pipeline, we generate 20,490 expert-validated samples spanning a three-tier evaluation framework: perception, spatial reasoning, and maritime decision-making. Under a unified zero-shot protocol, we evaluate ten state-of-the-art MLLMs, revealing that even the best-performing model achieves only 47.88% accuracy. This highlights systemic deficiencies in symbol grounding, spatial computation, multi-constraint reasoning, and robustness to variations in illumination and scale.

Technology Category

Application Category

πŸ“ Abstract
Electronic Navigational Charts (ENCs) are the safety-critical backbone of modern maritime navigation, yet it remains unclear whether multimodal large language models (MLLMs) can reliably interpret them. Unlike natural images or conventional charts, ENCs encode regulations, bathymetry, and route constraints via standardized vector symbols, scale-dependent rendering, and precise geometric structure -- requiring specialized maritime expertise for interpretation. We introduce ENC-Bench, the first benchmark dedicated to professional ENC understanding. ENC-Bench contains 20,490 expert-validated samples from 840 authentic National Oceanic and Atmospheric Administration (NOAA) ENCs, organized into a three-level hierarchy: Perception (symbol and feature recognition), Spatial Reasoning (coordinate localization, bearing, distance), and Maritime Decision-Making (route legality, safety assessment, emergency planning under multiple constraints). All samples are generated from raw S-57 data through a calibrated vector-to-image pipeline with automated consistency checks and expert review. We evaluate 10 state-of-the-art MLLMs such as GPT-4o, Gemini 2.5, Qwen3-VL, InternVL-3, and GLM-4.5V, under a unified zero-shot protocol. The best model achieves only 47.88% accuracy, with systematic challenges in symbolic grounding, spatial computation, multi-constraint reasoning, and robustness to lighting and scale variations. By establishing the first rigorous ENC benchmark, we open a new research frontier at the intersection of specialized symbolic reasoning and safety-critical AI, providing essential infrastructure for advancing MLLMs toward professional maritime applications.
Problem

Research questions and friction points this paper is trying to address.

Electronic Navigational Charts
Multimodal Large Language Models
Maritime Navigation
Symbolic Reasoning
Safety-Critical AI
Innovation

Methods, ideas, or system contributions that make the work stand out.

Electronic Navigational Chart
Multimodal Large Language Models
Benchmark
Symbolic Reasoning
Safety-Critical AI
A
Ao Cheng
National University of Defense Technology
X
Xingming Li
National University of Defense Technology
X
Xuanyu Ji
National University of Defense Technology
X
Xixiang He
National University of Defense Technology
Qiyao Sun
Qiyao Sun
QueenMary University of London
AI Scientist
C
Chunping Qiu
Intelligent Game and Decision Lab
R
Runke Huang
The Chinese University of Hong Kong, Shenzhen
Qingyong Hu
Qingyong Hu
Ph.D. of Computer Science, University of Oxford
3D VisionPhotogrammetryPoint Cloud ProcessingAutonomous Driving