Deep Learning Framework Testing via Heuristic Guidance Based on Multiple Model Measurements

๐Ÿ“… 2025-07-20
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing deep learning framework testing methods rely on a single bug-detection metric, suffering from three key limitations: (1) inability to quantify operator composition diversity, (2) neglect of execution-time constraints, and (3) failure to model correlations among multiple evaluation metrics. To address these, we propose a multi-dimensional heuristic-guided model generation approach thatโ€” for the first timeโ€”jointly optimizes three orthogonal objectives: operator composition diversity, bug-detection capability, and execution time. Leveraging statistical correlation analysis among these metrics, we design a hierarchical trade-off mechanism. Our method integrates operator composition analysis, dynamic time modeling, and correlation-aware search to achieve optimal allocation of testing resources. Experimental results demonstrate that, under identical testing budgets, our approach significantly improves both bug detection rate and operator coverage, achieving an average 32.7% gain in bug-triggering efficiency over state-of-the-art baseline methods.

Technology Category

Application Category

๐Ÿ“ Abstract
Deep learning frameworks serve as the foundation for developing and deploying deep learning applications. To enhance the quality of deep learning frameworks, researchers have proposed numerous testing methods using deep learning models as test inputs. However, existing methods predominantly measure model bug detection effectiveness as heuristic indicators, presenting three critical limitations: Firstly, existing methods fail to quantitatively measure model's operator combination variety, potentially missing critical operator combinations that could trigger framework bugs. Secondly, existing methods neglect measuring model execution time, resulting in the omission of numerous models potential for detecting more framework bugs within limited testing time. Thirdly, existing methods overlook correlation between different model measurements, relying simply on single-indicator heuristic guidance without considering their trade-offs. To overcome these limitations, we propose DLMMM, the first deep learning framework testing method to include multiple model measurements into heuristic guidance and fuse these measurements to achieve their trade-off. DLMMM firstly quantitatively measures model's bug detection performance, operator combination variety, and model execution time. After that, DLMMM fuses the above measurements based on their correlation to achieve their trade-off. To further enhance testing effectiveness, DLMMM designs multi-level heuristic guidance for test input model generation.
Problem

Research questions and friction points this paper is trying to address.

Quantify operator combination variety in deep learning models
Measure model execution time for efficient bug detection
Fuse multiple model measurements to optimize testing trade-offs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Quantitatively measures operator combination variety
Incorporates model execution time measurement
Fuses multiple model measurements for trade-off
๐Ÿ”Ž Similar Papers
No similar papers found.
Y
Yinglong Zou
State Key Laboratory for Novel Software Technology, Nanjing University, China
Juan Zhai
Juan Zhai
University of Massachusetts, Amherst
software text analyticssoftware reliabilitydeep learning
Chunrong Fang
Chunrong Fang
Software Institute, Nanjing University
Software TestingSoftware EngineeringComputer Science
Yanzhou Mu
Yanzhou Mu
Nanjing university
deep learning testingSE4AIconcurrency testingsoftware defect prediction
J
Jiawei Liu
State Key Laboratory for Novel Software Technology, Nanjing University, China
Z
Zhenyu Chen
State Key Laboratory for Novel Software Technology, Nanjing University, China