Deep Learning Framework Testing via Heuristic Guidance Based on Multiple Model Measurements

📅 2025-07-20

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

Existing deep learning framework testing methods rely on a single bug-detection metric, suffering from three key limitations: (1) inability to quantify operator composition diversity, (2) neglect of execution-time constraints, and (3) failure to model correlations among multiple evaluation metrics. To address these, we propose a multi-dimensional heuristic-guided model generation approach that— for the first time—jointly optimizes three orthogonal objectives: operator composition diversity, bug-detection capability, and execution time. Leveraging statistical correlation analysis among these metrics, we design a hierarchical trade-off mechanism. Our method integrates operator composition analysis, dynamic time modeling, and correlation-aware search to achieve optimal allocation of testing resources. Experimental results demonstrate that, under identical testing budgets, our approach significantly improves both bug detection rate and operator coverage, achieving an average 32.7% gain in bug-triggering efficiency over state-of-the-art baseline methods.

Technology Category

Application Category

📝 Abstract

Deep learning frameworks serve as the foundation for developing and deploying deep learning applications. To enhance the quality of deep learning frameworks, researchers have proposed numerous testing methods using deep learning models as test inputs. However, existing methods predominantly measure model bug detection effectiveness as heuristic indicators, presenting three critical limitations: Firstly, existing methods fail to quantitatively measure model's operator combination variety, potentially missing critical operator combinations that could trigger framework bugs. Secondly, existing methods neglect measuring model execution time, resulting in the omission of numerous models potential for detecting more framework bugs within limited testing time. Thirdly, existing methods overlook correlation between different model measurements, relying simply on single-indicator heuristic guidance without considering their trade-offs. To overcome these limitations, we propose DLMMM, the first deep learning framework testing method to include multiple model measurements into heuristic guidance and fuse these measurements to achieve their trade-off. DLMMM firstly quantitatively measures model's bug detection performance, operator combination variety, and model execution time. After that, DLMMM fuses the above measurements based on their correlation to achieve their trade-off. To further enhance testing effectiveness, DLMMM designs multi-level heuristic guidance for test input model generation.

Problem

Research questions and friction points this paper is trying to address.

Quantify operator combination variety in deep learning models

Measure model execution time for efficient bug detection

Fuse multiple model measurements to optimize testing trade-offs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Quantitatively measures operator combination variety

Incorporates model execution time measurement

Fuses multiple model measurements for trade-off

🔎 Similar Papers

Deep Learning Library Testing: Definition, Methods and Challenges