🤖 AI Summary
To address the low efficiency and poor accuracy of information retrieval and maintenance instruction generation for technicians handling heterogeneous multimodal data (e.g., text, images, 3D models) in XR environments, this paper proposes the first cross-format Retrieval-Augmented Generation (RAG) framework tailored for industrial XR. The framework achieves unified retrieval via cross-modal semantic alignment and integrates large language models (LLMs)—specifically GPT-4 and GPT-4o-mini—to generate context-aware maintenance instructions. Its key innovation lies in the first end-to-end integration of joint multimodal retrieval and LLM-based instruction generation within an XR runtime. Experimental results demonstrate a 37% improvement in instruction response accuracy and an average latency of 1.18 seconds; for complex queries, BLEU and METEOR scores reach 42.6 and 48.3, respectively—validating the framework’s superior real-time performance, accuracy, and industrial applicability.
📝 Abstract
This paper presents a detailed evaluation of a Retrieval-Augmented Generation (RAG) system that integrates large language models (LLMs) to enhance information retrieval and instruction generation for maintenance personnel across diverse data formats. We assessed the performance of eight LLMs, emphasizing key metrics such as response speed and accuracy, which were quantified using BLEU and METEOR scores. Our findings reveal that advanced models like GPT-4 and GPT-4o-mini significantly outperform their counterparts, particularly when addressing complex queries requiring multi-format data integration. The results validate the system's ability to deliver timely and accurate responses, highlighting the potential of RAG frameworks to optimize maintenance operations. Future research will focus on refining retrieval techniques for these models and enhancing response generation, particularly for intricate scenarios, ultimately improving the system's practical applicability in dynamic real-world environments.