ThermalGuardian: Temperature-Aware Testing of Automotive Deep Learning Frameworks

📅 2025-09-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Modern automotive deep learning frameworks exhibit temperature-sensitive failures—such as GPU thermal throttling, computation latency, high/mixed-precision errors, and time-series synchronization anomalies—under extreme ambient temperatures (−40°C to 50°C). Existing testing methodologies neglect thermal effects and thus fail to detect such defects. Method: We propose the first temperature-aware testing framework for deep learning systems, modeling GPU thermal dynamics via Newton’s law of cooling, integrating real-time temperature–driven frequency scaling, and generating test cases through operator-level model mutation rules tailored to thermal sensitivity. Contribution/Results: Evaluated on mainstream automotive AI frameworks, our approach successfully uncovers novel thermal-induced defects—including latency spikes, accuracy degradation, and synchronization failures—demonstrating its effectiveness in exposing environment-dependent vulnerabilities. This work bridges a critical gap in AI framework quality assurance by explicitly incorporating environmental temperature as a first-class testing dimension.

Technology Category

Application Category

📝 Abstract
Deep learning models play a vital role in autonomous driving systems, supporting critical functions such as environmental perception. To accelerate model inference, these deep learning models' deployment relies on automotive deep learning frameworks, for example, PaddleInference in Apollo and TensorRT in AutoWare. However, unlike deploying deep learning models on the cloud, vehicular environments experience extreme ambient temperatures varying from -40°C to 50°C, significantly impacting GPU temperature. Additionally, heats generated when computing further lead to the GPU temperature increase. These temperature fluctuations lead to dynamic GPU frequency adjustments through mechanisms such as DVFS. However, automotive deep learning frameworks are designed without considering the impact of temperature-induced frequency variations. When deployed on temperature-varying GPUs, these frameworks suffer critical quality issues: compute-intensive operators face delays or errors, high/mixed-precision operators suffer from precision errors, and time-series operators suffer from synchronization issues. The above quality issues cannot be detected by existing deep learning framework testing methods because they ignore temperature's effect on the deep learning framework quality. To bridge this gap, we propose ThermalGuardian, the first automotive deep learning framework testing method under temperature-varying environments. Specifically, ThermalGuardian generates test input models using model mutation rules targeting temperature-sensitive operators, simulates GPU temperature fluctuations based on Newton's law of cooling, and controls GPU frequency based on real-time GPU temperature.
Problem

Research questions and friction points this paper is trying to address.

Testing automotive deep learning frameworks under temperature variations
Detecting temperature-induced GPU frequency impact on framework quality
Addressing precision errors and synchronization issues in vehicular environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Model mutation rules for temperature-sensitive operators
Simulates GPU temperature fluctuations via Newton's cooling
Controls GPU frequency based on real-time temperature
🔎 Similar Papers
No similar papers found.
Y
Yinglong Zou
State Key Laboratory for Novel Software Technology, Nanjing University, China
Juan Zhai
Juan Zhai
University of Massachusetts, Amherst
software text analyticssoftware reliabilitydeep learning
Chunrong Fang
Chunrong Fang
Software Institute, Nanjing University
Software TestingSoftware EngineeringComputer Science
Z
Zhenyu Chen
State Key Laboratory for Novel Software Technology, Nanjing University, China