BLUEPRINT Rebuilding a Legacy: Multimodal Retrieval for Complex Engineering Drawings and Documents

📅 2026-02-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the long-standing challenge of retrieving engineering drawings and technical documents, which is hindered by missing or inconsistent metadata and heavy reliance on manual processing. The authors propose a layout-aware multimodal retrieval system that integrates standardized drawing region detection, region-constrained visual language model (VLM)-based OCR, identifier normalization, and a hybrid approach combining lexical and dense retrieval, enhanced by a lightweight region-level re-ranking mechanism. This framework enables automated structuring and cross-facility search over large-scale engineering archives. Notably, it is the first to combine layout awareness, region-level VLM-based OCR, and efficient re-ranking specifically tailored for complex engineering drawings. Evaluated on a benchmark of 5,000 documents, the method achieves an absolute 10.1% improvement in Success@3 and a relative 18.9% gain in nDCG@3, significantly outperforming existing vision-language baselines.

Technology Category

Application Category

📝 Abstract
Decades of engineering drawings and technical records remain locked in legacy archives with inconsistent or missing metadata, making retrieval difficult and often manual. We present Blueprint, a layout-aware multimodal retrieval system designed for large-scale engineering repositories. Blueprint detects canonical drawing regions, applies region-restricted VLM-based OCR, normalizes identifiers (e.g., DWG, part, facility), and fuses lexical and dense retrieval with a lightweight region-level reranker. Deployed on ~770k unlabeled files, it automatically produces structured metadata suitable for cross-facility search. We evaluate Blueprint on a 5k-file benchmark with 350 expert-curated queries using pooled, graded (0/1/2) relevance judgments. Blueprint delivers a 10.1% absolute gain in Success@3 and an 18.9% relative improvement in nDCG@3 over the strongest vision-language baseline}, consistently outperforming across vision, text, and multimodal intents. Oracle ablations reveal substantial headroom under perfect region detection and OCR. We release all queries, runs, annotations, and code to facilitate reproducible evaluation on legacy engineering archives.
Problem

Research questions and friction points this paper is trying to address.

engineering drawings
legacy archives
metadata
multimodal retrieval
technical documents
Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal retrieval
layout-aware OCR
engineering document understanding
metadata normalization
region-level reranking
🔎 Similar Papers
No similar papers found.