🤖 AI Summary
Automated crash reporting systems generate massive volumes of duplicate crash reports, significantly burdening developers. Conventional stack trace–based deduplication methods—such as string matching, rule-based heuristics, or shallow deep learning models—fail to capture semantic context and structural dependencies inherent in stack traces. To address this, we propose *dedupT*, the first Transformer-based architecture specifically adapted from pretrained language models (PLMs) for stack trace modeling. *dedupT* performs end-to-end learning of deep semantic and control-flow structural relationships within stack sequences and employs a fully connected network for similarity measurement and ranking. Extensive experiments across four public datasets demonstrate that *dedupT* achieves an average 15%+ improvement in Mean Reciprocal Rank (MRR) over the best deep learning baseline, with significantly higher ROC-AUC. It consistently outperforms traditional approaches—including sequence alignment and information retrieval methods—delivering superior crash uniqueness identification accuracy and deduplication efficiency.
📝 Abstract
Automated crash reporting systems generate large volumes of duplicate reports, overwhelming issue-tracking systems and increasing developer workload. Traditional stack trace-based deduplication methods, relying on string similarity, rule-based heuristics, or deep learning (DL) models, often fail to capture the contextual and structural relationships within stack traces. We propose dedupT, a transformer-based approach that models stack traces holistically rather than as isolated frames. dedupT first adapts a pretrained language model (PLM) to stack traces, then uses its embeddings to train a fully-connected network (FCN) to rank duplicate crashes effectively. Extensive experiments on real-world datasets show that dedupT outperforms existing DL and traditional methods (e.g., sequence alignment and information retrieval techniques) in both duplicate ranking and unique crash detection, significantly reducing manual triage effort. On four public datasets, dedupT improves Mean Reciprocal Rank (MRR) often by over 15% compared to the best DL baseline and up to 9% over traditional methods while achieving higher Receiver Operating Characteristic Area Under the Curve (ROC-AUC) in detecting unique crash reports. Our work advances the integration of modern natural language processing (NLP) techniques into software engineering, providing an effective solution for stack trace-based crash deduplication.