Automated Parsing of Engineering Drawings for Structured Information Extraction Using a Fine-tuned Document Understanding Transformer

📅 2025-05-02

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

Extracting structured information from 2D engineering drawings remains challenging due to poor robustness of conventional OCR in complex layouts and symbol-overlapping scenarios, yielding unstructured outputs with high error rates. Method: This paper proposes a hybrid framework integrating oriented bounding box (OBB) detection with the Donut document understanding Transformer. We introduce a novel single-model, cross-category joint fine-tuning strategy to mitigate hallucination and enhance generalization. OBB detection is implemented via YOLOv11, trained on a custom nine-class annotated dataset; structured JSON generation is incorporated into the fine-tuning pipeline. Results: The framework achieves 94.77% accuracy in geometric dimensioning and tolerancing (GD&T) recognition, 100% recall for most classes, an overall F1-score of 97.3%, and reduces hallucination rate to 5.23%. It significantly decreases manual annotation effort and supports industrial-scale deployment.

Technology Category

Application Category

📝 Abstract

Accurate extraction of key information from 2D engineering drawings is crucial for high-precision manufacturing. Manual extraction is time-consuming and error-prone, while traditional Optical Character Recognition (OCR) techniques often struggle with complex layouts and overlapping symbols, resulting in unstructured outputs. To address these challenges, this paper proposes a novel hybrid deep learning framework for structured information extraction by integrating an oriented bounding box (OBB) detection model with a transformer-based document parsing model (Donut). An in-house annotated dataset is used to train YOLOv11 for detecting nine key categories: Geometric Dimensioning and Tolerancing (GD&T), General Tolerances, Measures, Materials, Notes, Radii, Surface Roughness, Threads, and Title Blocks. Detected OBBs are cropped into images and labeled to fine-tune Donut for structured JSON output. Fine-tuning strategies include a single model trained across all categories and category-specific models. Results show that the single model consistently outperforms category-specific ones across all evaluation metrics, achieving higher precision (94.77% for GD&T), recall (100% for most), and F1 score (97.3%), while reducing hallucination (5.23%). The proposed framework improves accuracy, reduces manual effort, and supports scalable deployment in precision-driven industries.

Problem

Research questions and friction points this paper is trying to address.

Automating extraction of structured data from engineering drawings

Overcoming limitations of manual and traditional OCR methods

Enhancing accuracy and efficiency in precision manufacturing

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid deep learning framework for structured extraction

Fine-tuned Donut transformer for JSON output

Single model outperforms category-specific models

🔎 Similar Papers

Deep Learning based Key Information Extraction from Business Documents: Systematic Literature Review