CAD2DMD-SET: Synthetic Generation Tool of Digital Measurement Device CAD Model Datasets for fine-tuning Large Vision-Language Models

📅 2025-08-29

📈 Citations: 0

✨ Influential: 0

career value

221K/year

🤖 AI Summary

Large Vision-Language Models (LVLMs) exhibit significant performance degradation when reading digital measurement devices (DMDs) in real-world scenarios—especially under occlusion, motion blur, and extreme viewing angles. To address this, we propose the first 3D CAD model–based synthetic data generation framework, leveraging high-fidelity rendering and image compositing to construct a large-scale, DMD-specific synthetic dataset with visual question-answering (VQA) annotations. We further introduce DMDBench, a real-world evaluation benchmark for DMD reading. Our method employs LoRA-based efficient fine-tuning of LVLMs, enhancing domain robustness without compromising general-purpose capabilities. Experiments demonstrate that InternVL achieves a 200% relative improvement in ANLS score, with particularly strong generalization under occlusion and motion blur. This work establishes a scalable paradigm for both data generation and evaluation in LVLM-driven visual understanding of precision instruments.

Technology Category

Application Category

📝 Abstract

Recent advancements in Large Vision-Language Models (LVLMs) have demonstrated impressive capabilities across various multimodal tasks. They continue, however, to struggle with trivial scenarios such as reading values from Digital Measurement Devices (DMDs), particularly in real-world conditions involving clutter, occlusions, extreme viewpoints, and motion blur; common in head-mounted cameras and Augmented Reality (AR) applications. Motivated by these limitations, this work introduces CAD2DMD-SET, a synthetic data generation tool designed to support visual question answering (VQA) tasks involving DMDs. By leveraging 3D CAD models, advanced rendering, and high-fidelity image composition, our tool produces diverse, VQA-labelled synthetic DMD datasets suitable for fine-tuning LVLMs. Additionally, we present DMDBench, a curated validation set of 1,000 annotated real-world images designed to evaluate model performance under practical constraints. Benchmarking three state-of-the-art LVLMs using Average Normalised Levenshtein Similarity (ANLS) and further fine-tuning LoRA's of these models with CAD2DMD-SET's generated dataset yielded substantial improvements, with InternVL showcasing a score increase of 200% without degrading on other tasks. This demonstrates that the CAD2DMD-SET training dataset substantially improves the robustness and performance of LVLMs when operating under the previously stated challenging conditions. The CAD2DMD-SET tool is expected to be released as open-source once the final version of this manuscript is prepared, allowing the community to add different measurement devices and generate their own datasets.

Problem

Research questions and friction points this paper is trying to address.

Improving LVLMs' reading of digital measurement devices

Addressing challenges like clutter, occlusions, and motion blur

Generating synthetic datasets for fine-tuning vision-language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Synthetic data generation using 3D CAD models

Advanced rendering and high-fidelity image composition

Fine-tuning LVLMs with VQA-labeled datasets

🔎 Similar Papers

No similar papers found.