🤖 AI Summary
To address the low prediction accuracy of Software Reliability Growth Models (SRGMs) in data-scarce scenarios—such as early testing or safety-critical systems—this paper proposes a deep learning framework integrating synthetic data generation with cross-project transfer learning. Innovatively, it incorporates synthetic data generated from classical SRGMs into the transfer learning pipeline and introduces a correlation-aware clustering strategy to select source-project data with high pattern similarity, thereby simultaneously mitigating data insufficiency and respecting privacy constraints. Experimental evaluation across 60 real-world industrial datasets demonstrates that the proposed method improves prediction accuracy by 23.3% on average over conventional SRGMs and by 32.2% over cross-project deep learning models trained solely on real data. These results substantiate its superior capability in enhancing reliability prediction during early testing phases.
📝 Abstract
Software Reliability Growth Models (SRGMs) are widely used to predict software reliability based on defect discovery data collected during testing or operational phases. However, their predictive accuracy often degrades in data-scarce environments, such as early-stage testing or safety-critical systems. Although cross-project transfer learning has been explored to mitigate this issue by leveraging data from past projects, its applicability remains limited due to the scarcity and confidentiality of real-world datasets. To overcome these limitations, we propose Deep Synthetic Cross-project SRGM (DSC-SRGM), a novel approach that integrates synthetic data generation with cross-project transfer learning. Synthetic datasets are generated using traditional SRGMs to preserve the statistical characteristics of real-world defect discovery trends. A cross-correlation-based clustering method is applied to identify synthetic datasets with patterns similar to the target project. These datasets are then used to train a deep learning model for reliability prediction. The proposed method is evaluated on 60 real-world datasets, and its performance is compared with both traditional SRGMs and cross-project deep learning models trained on real-world datasets. DSC-SRGM achieves up to 23.3% improvement in predictive accuracy over traditional SRGMs and 32.2% over cross-project deep learning models trained on real-world datasets. However, excessive use of synthetic data or a naive combination of synthetic and real-world data may degrade prediction performance, highlighting the importance of maintaining an appropriate data balance. These findings indicate that DSC-SRGM is a promising approach for software reliability prediction in data-scarce environments.