Data Leakage in Automotive Perception: Practitioners' Insights

📅 2026-04-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the reliability risks in automotive perception systems arising from unintended data leakage between training and evaluation datasets, revealing for the first time from an industrial practice perspective that such leakage is fundamentally a socio-technical coordination challenge across roles. Through semi-structured interviews with ten automotive perception engineers and reflexive thematic analysis, the research finds that industry awareness of data leakage remains fragmented across roles, with mitigation strategies largely relying on tacit knowledge transfer rather than systematic tooling. The work proposes establishing a unified definition of data leakage, implementing traceable data pipelines, and fostering continuous cross-functional communication to enhance the reliability engineering of automotive machine learning systems.
📝 Abstract
Data leakage is the inadvertent transfer of information between training and evaluation datasets that poses a subtle, yet critical, risk to the reliability of machine learning (ML) models in safety-critical systems such as automotive perception. While leakage is widely recognized in research, little is known about how industrial practitioners actually perceive and manage it in practice. This study investigates practitioners' knowledge, experiences, and mitigation strategies around data leakage through ten semi-structured interviews with system design, development, and verification engineers working on automotive perception functions development. Using reflexive thematic analysis, we identify that knowledge of data leakage is widespread and fragmented along role boundaries: ML engineers conceptualize it as a data-splitting or validation issue, whereas design and verification roles interpret it in terms of representativeness and scenario coverage. Detection commonly arises through generic considerations and observed performance anomalies rather than implying specific tools. However, data leakage prevention is more commonly practiced, which depends mostly on experience and knowledge sharing. These findings suggest that leakage control is a socio-technical coordination problem distributed across roles and workflows. We discuss implications for ML reliability engineering, highlighting the need for shared definitions, traceable data practices, and continuous cross-role communication to institutionalize data leakage awareness within automotive ML development.
Problem

Research questions and friction points this paper is trying to address.

data leakage
automotive perception
machine learning reliability
industrial practice
safety-critical systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

data leakage
automotive perception
socio-technical coordination
ML reliability engineering
practitioner study
🔎 Similar Papers
No similar papers found.
M
Md Abu Ahammed Babu
Volvo Cars | University of Gothenburg and Chalmers University of Technology
S
Sushant Kumar Pandey
University of Groningen
D
Darko Durisic
Volvo Cars
A
András Bálint
Volvo Cars
Miroslaw Staron
Miroslaw Staron
Software engineering, University of Gothenburg
Software engineeringmetricsisodependabilitycomputer science