Procedural Knowledge Extraction from Industrial Troubleshooting Guides Using Vision Language Models

📅 2026-01-30

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

This work addresses the challenge of extracting structured diagnostic knowledge from industrial troubleshooting guides, where procedural workflows are commonly represented as flowcharts combining visual layouts and technical text. Manual extraction is labor-intensive, error-prone, and hinders efficient integration into operator support systems. To overcome this, the paper proposes an end-to-end approach leveraging vision-language models (VLMs) to jointly parse spatial arrangements and textual content within such diagrams. Two prompting strategies are innovatively designed and compared: standard instruction-based prompting versus layout-aware prompting that incorporates domain-specific troubleshooting priors. The study reveals a trade-off between layout sensitivity and semantic robustness in VLM performance. Experimental results demonstrate significant performance disparities across different VLMs under the two prompting paradigms, offering empirical guidance for the co-design of model selection and prompting strategies in real-world deployment scenarios.

Technology Category

Application Category

📝 Abstract

Industrial troubleshooting guides encode diagnostic procedures in flowchart-like diagrams where spatial layout and technical language jointly convey meaning. To integrate this knowledge into operator support systems, which assist shop-floor personnel in diagnosing and resolving equipment issues, the information must first be extracted and structured for machine interpretation. However, when performed manually, this extraction is labor-intensive and error-prone. Vision Language Models offer potential to automate this process by jointly interpreting visual and textual meaning, yet their performance on such guides remains underexplored. This paper evaluates two VLMs on extracting structured knowledge, comparing two prompting strategies: standard instruction-guided versus an augmented approach that cues troubleshooting layout patterns. Results reveal model-specific trade-offs between layout sensitivity and semantic robustness, informing practical deployment decisions.

Problem

Research questions and friction points this paper is trying to address.

Procedural Knowledge Extraction

Industrial Troubleshooting Guides

Vision Language Models

Structured Knowledge

Machine Interpretation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision Language Models

Procedural Knowledge Extraction

Industrial Troubleshooting Guides