๐ค AI Summary
Existing deep learning framework testing methods rely on a single bug-detection metric, suffering from three key limitations: (1) inability to quantify operator composition diversity, (2) neglect of execution-time constraints, and (3) failure to model correlations among multiple evaluation metrics. To address these, we propose a multi-dimensional heuristic-guided model generation approach thatโ for the first timeโjointly optimizes three orthogonal objectives: operator composition diversity, bug-detection capability, and execution time. Leveraging statistical correlation analysis among these metrics, we design a hierarchical trade-off mechanism. Our method integrates operator composition analysis, dynamic time modeling, and correlation-aware search to achieve optimal allocation of testing resources. Experimental results demonstrate that, under identical testing budgets, our approach significantly improves both bug detection rate and operator coverage, achieving an average 32.7% gain in bug-triggering efficiency over state-of-the-art baseline methods.
๐ Abstract
Deep learning frameworks serve as the foundation for developing and deploying deep learning applications. To enhance the quality of deep learning frameworks, researchers have proposed numerous testing methods using deep learning models as test inputs. However, existing methods predominantly measure model bug detection effectiveness as heuristic indicators, presenting three critical limitations: Firstly, existing methods fail to quantitatively measure model's operator combination variety, potentially missing critical operator combinations that could trigger framework bugs. Secondly, existing methods neglect measuring model execution time, resulting in the omission of numerous models potential for detecting more framework bugs within limited testing time. Thirdly, existing methods overlook correlation between different model measurements, relying simply on single-indicator heuristic guidance without considering their trade-offs. To overcome these limitations, we propose DLMMM, the first deep learning framework testing method to include multiple model measurements into heuristic guidance and fuse these measurements to achieve their trade-off. DLMMM firstly quantitatively measures model's bug detection performance, operator combination variety, and model execution time. After that, DLMMM fuses the above measurements based on their correlation to achieve their trade-off. To further enhance testing effectiveness, DLMMM designs multi-level heuristic guidance for test input model generation.