🤖 AI Summary
This work addresses the long-standing challenge of retrieving engineering drawings and technical documents, which is hindered by missing or inconsistent metadata and heavy reliance on manual processing. The authors propose a layout-aware multimodal retrieval system that integrates standardized drawing region detection, region-constrained visual language model (VLM)-based OCR, identifier normalization, and a hybrid approach combining lexical and dense retrieval, enhanced by a lightweight region-level re-ranking mechanism. This framework enables automated structuring and cross-facility search over large-scale engineering archives. Notably, it is the first to combine layout awareness, region-level VLM-based OCR, and efficient re-ranking specifically tailored for complex engineering drawings. Evaluated on a benchmark of 5,000 documents, the method achieves an absolute 10.1% improvement in Success@3 and a relative 18.9% gain in nDCG@3, significantly outperforming existing vision-language baselines.
📝 Abstract
Decades of engineering drawings and technical records remain locked in legacy archives with inconsistent or missing metadata, making retrieval difficult and often manual. We present Blueprint, a layout-aware multimodal retrieval system designed for large-scale engineering repositories. Blueprint detects canonical drawing regions, applies region-restricted VLM-based OCR, normalizes identifiers (e.g., DWG, part, facility), and fuses lexical and dense retrieval with a lightweight region-level reranker. Deployed on ~770k unlabeled files, it automatically produces structured metadata suitable for cross-facility search. We evaluate Blueprint on a 5k-file benchmark with 350 expert-curated queries using pooled, graded (0/1/2) relevance judgments. Blueprint delivers a 10.1% absolute gain in Success@3 and an 18.9% relative improvement in nDCG@3 over the strongest vision-language baseline}, consistently outperforming across vision, text, and multimodal intents. Oracle ablations reveal substantial headroom under perfect region detection and OCR. We release all queries, runs, annotations, and code to facilitate reproducible evaluation on legacy engineering archives.