Smarter, not Bigger: Fine-Tuned RAG-Enhanced LLMs for Automotive HIL Testing

📅 2025-11-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the fragmentation, low reusability, and development inefficiency of test assets in automotive Hardware-in-the-Loop (HIL) testing, this paper proposes HIL-GPT: a domain-adapted lightweight large language model (LLM) integrating Retrieval-Augmented Generation (RAG) and a fine-tuned semantic embedding model to enable traceable, bidirectional retrieval between requirements and test cases. Methodologically, we introduce a novel data curation pipeline combining heuristic mining and LLM-based synthesis to construct a high-quality, domain-specific dataset. We empirically validate that compact models achieve an optimal trade-off among accuracy, inference latency, and deployment cost—challenging the prevailing “bigger is better” assumption. A/B experiments demonstrate that HIL-GPT significantly outperforms general-purpose LLMs in practical utility, result reliability, and user satisfaction.

Technology Category

Application Category

📝 Abstract
Hardware-in-the-Loop (HIL) testing is essential for automotive validation but suffers from fragmented and underutilized test artifacts. This paper presents HIL-GPT, a retrieval-augmented generation (RAG) system integrating domain-adapted large language models (LLMs) with semantic retrieval. HIL-GPT leverages embedding fine-tuning using a domain-specific dataset constructed via heuristic mining and LLM-assisted synthesis, combined with vector indexing for scalable, traceable test case and requirement retrieval. Experiments show that fine-tuned compact models, such as exttt{bge-base-en-v1.5}, achieve a superior trade-off between accuracy, latency, and cost compared to larger models, challenging the notion that bigger is always better. An A/B user study further confirms that RAG-enhanced assistants improve perceived helpfulness, truthfulness, and satisfaction over general-purpose LLMs. These findings provide insights for deploying efficient, domain-aligned LLM-based assistants in industrial HIL environments.
Problem

Research questions and friction points this paper is trying to address.

Enhances automotive HIL test artifact retrieval and utilization
Optimizes accuracy, latency, and cost with fine-tuned compact models
Improves perceived helpfulness and truthfulness in industrial assistants
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuned RAG system integrates domain-adapted LLMs with semantic retrieval
Embedding fine-tuning uses domain-specific dataset from heuristic mining and synthesis
Vector indexing enables scalable, traceable test case and requirement retrieval
🔎 Similar Papers
No similar papers found.
Chao Feng
Chao Feng
University of Zurich
networkmachine learningcybersecurity
Z
Zihan Liu
Communication Systems Group CSG, Department of Informatics IfI, University of Zurich UZH, 8050 Zürich, Switzerland
S
Siddhant Gupta
Volvo Car Corporation, 405 31 Göteborg, Sweden
G
Gongpei Cui
Volvo Car Corporation, 405 31 Göteborg, Sweden
Jan von der Assen
Jan von der Assen
University of Zurich
B
Burkhard Stiller
Communication Systems Group CSG, Department of Informatics IfI, University of Zurich UZH, 8050 Zürich, Switzerland