🤖 AI Summary
Modern automotive deep learning frameworks exhibit temperature-sensitive failures—such as GPU thermal throttling, computation latency, high/mixed-precision errors, and time-series synchronization anomalies—under extreme ambient temperatures (−40°C to 50°C). Existing testing methodologies neglect thermal effects and thus fail to detect such defects. Method: We propose the first temperature-aware testing framework for deep learning systems, modeling GPU thermal dynamics via Newton’s law of cooling, integrating real-time temperature–driven frequency scaling, and generating test cases through operator-level model mutation rules tailored to thermal sensitivity. Contribution/Results: Evaluated on mainstream automotive AI frameworks, our approach successfully uncovers novel thermal-induced defects—including latency spikes, accuracy degradation, and synchronization failures—demonstrating its effectiveness in exposing environment-dependent vulnerabilities. This work bridges a critical gap in AI framework quality assurance by explicitly incorporating environmental temperature as a first-class testing dimension.
📝 Abstract
Deep learning models play a vital role in autonomous driving systems, supporting critical functions such as environmental perception. To accelerate model inference, these deep learning models' deployment relies on automotive deep learning frameworks, for example, PaddleInference in Apollo and TensorRT in AutoWare. However, unlike deploying deep learning models on the cloud, vehicular environments experience extreme ambient temperatures varying from -40°C to 50°C, significantly impacting GPU temperature. Additionally, heats generated when computing further lead to the GPU temperature increase. These temperature fluctuations lead to dynamic GPU frequency adjustments through mechanisms such as DVFS. However, automotive deep learning frameworks are designed without considering the impact of temperature-induced frequency variations. When deployed on temperature-varying GPUs, these frameworks suffer critical quality issues: compute-intensive operators face delays or errors, high/mixed-precision operators suffer from precision errors, and time-series operators suffer from synchronization issues. The above quality issues cannot be detected by existing deep learning framework testing methods because they ignore temperature's effect on the deep learning framework quality. To bridge this gap, we propose ThermalGuardian, the first automotive deep learning framework testing method under temperature-varying environments. Specifically, ThermalGuardian generates test input models using model mutation rules targeting temperature-sensitive operators, simulates GPU temperature fluctuations based on Newton's law of cooling, and controls GPU frequency based on real-time GPU temperature.