DialBench: Towards Accurate Reading Recognition of Pointer Meter using Large Foundation Models

📅 2025-11-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Pointer-type meter reading recognition is critical for intelligent power systems but remains challenging due to reflections, occlusions, viewpoint variations, and visual ambiguity between pointers and scales—limiting the robustness of existing methods and hindered by the absence of large-scale benchmark datasets. To address these issues, we introduce RPM-10K, the first real-synthetic hybrid dataset comprising over 10,000 annotated meter images. We further propose MRLM, a physics-informed vision-language model that explicitly encodes geometric constraints and causal reasoning as physical priors, achieves multimodal alignment via cross-attention, and enhances generalization to complex scenarios through an adaptive expert selection mechanism. On RPM-10K, MRLM significantly outperforms state-of-the-art methods, achieving high accuracy (MAE < 0.15°) and superior robustness. Both the code and dataset are publicly released.

Technology Category

Application Category

📝 Abstract
The precise reading recognition of pointer meters plays a key role in smart power systems, but existing approaches remain fragile due to challenges like reflections, occlusions, dynamic viewing angles, and overly between thin pointers and scale markings. Up to now, this area still lacks large-scale datasets to support the development of robust algorithms. To address these challenges, this paper first presents a new large-scale benchmark dataset for dial reading, termed RPM-10K, which contains 10730 meter images that fully reflect the aforementioned key challenges. Built upon the dataset, we propose a novel vision-language model for pointer meter reading recognition, termed MRLM, based on physical relation injection. Instead of exhaustively learning image-level correlations, MRLM explicitly encodes the geometric and causal relationships between the pointer and the scale, aligning perception with physical reasoning in the spirit of world-model perspectives. Through cross-attentional fusion and adaptive expert selection, the model learns to interpret dial configurations and generate precise numeric readings. Extensive experiments fully validated the effectiveness of our proposed framework on the newly proposed benchmark dataset. Both the dataset and source code will be released on https://github.com/Event-AHU/DialBench
Problem

Research questions and friction points this paper is trying to address.

Develops a large-scale dataset for pointer meter reading challenges
Proposes a vision-language model encoding geometric and causal relationships
Aims to improve reading accuracy under difficult real-world conditions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale benchmark dataset RPM-10K for dial reading
Vision-language model MRLM with physical relation injection
Cross-attentional fusion and adaptive expert selection for reasoning
🔎 Similar Papers
No similar papers found.
F
Futian Wang
School of Computer Science and Technology, Anhui University, Hefei 230601, China
C
Chaoliu Weng
School of Computer Science and Technology, Anhui University, Hefei 230601, China
X
Xiao Wang
School of Computer Science and Technology, Anhui University, Hefei 230601, China
Z
Zhen Chen
Department of Computer Science and Information Technology, La Trobe University, Bendigo, Australia
Zhicheng Zhao
Zhicheng Zhao
Associate Professor at the School of Artificial Intelligence, Anhui University
Computer Vision
Jin Tang
Jin Tang
Anhui University
Computer visionintelligent video analysis