DialBench: Towards Accurate Reading Recognition of Pointer Meter using Large Foundation Models

📅 2025-11-26

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Pointer-type meter reading recognition is critical for intelligent power systems but remains challenging due to reflections, occlusions, viewpoint variations, and visual ambiguity between pointers and scales—limiting the robustness of existing methods and hindered by the absence of large-scale benchmark datasets. To address these issues, we introduce RPM-10K, the first real-synthetic hybrid dataset comprising over 10,000 annotated meter images. We further propose MRLM, a physics-informed vision-language model that explicitly encodes geometric constraints and causal reasoning as physical priors, achieves multimodal alignment via cross-attention, and enhances generalization to complex scenarios through an adaptive expert selection mechanism. On RPM-10K, MRLM significantly outperforms state-of-the-art methods, achieving high accuracy (MAE < 0.15°) and superior robustness. Both the code and dataset are publicly released.

Technology Category

Application Category

📝 Abstract

The precise reading recognition of pointer meters plays a key role in smart power systems, but existing approaches remain fragile due to challenges like reflections, occlusions, dynamic viewing angles, and overly between thin pointers and scale markings. Up to now, this area still lacks large-scale datasets to support the development of robust algorithms. To address these challenges, this paper first presents a new large-scale benchmark dataset for dial reading, termed RPM-10K, which contains 10730 meter images that fully reflect the aforementioned key challenges. Built upon the dataset, we propose a novel vision-language model for pointer meter reading recognition, termed MRLM, based on physical relation injection. Instead of exhaustively learning image-level correlations, MRLM explicitly encodes the geometric and causal relationships between the pointer and the scale, aligning perception with physical reasoning in the spirit of world-model perspectives. Through cross-attentional fusion and adaptive expert selection, the model learns to interpret dial configurations and generate precise numeric readings. Extensive experiments fully validated the effectiveness of our proposed framework on the newly proposed benchmark dataset. Both the dataset and source code will be released on https://github.com/Event-AHU/DialBench

Problem

Research questions and friction points this paper is trying to address.

Develops a large-scale dataset for pointer meter reading challenges

Proposes a vision-language model encoding geometric and causal relationships

Aims to improve reading accuracy under difficult real-world conditions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale benchmark dataset RPM-10K for dial reading

Vision-language model MRLM with physical relation injection

Cross-attentional fusion and adaptive expert selection for reasoning

🔎 Similar Papers

No similar papers found.

Authors to Follow