Multi-View Adaptive Contrastive Learning for Information Retrieval Based Fault Localization

📅 2024-09-19

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

153K/year

🤖 AI Summary

Existing IR-based fault localization methods overlook critical semantic relationships—including report-code interactions, inter-report similarities, and co-referencing among code entities. To address this, we propose a novel multi-view graph neural network framework that jointly models these three complementary semantic views for the first time. We further design a cross-view adaptive contrastive learning mechanism to suppress noise and enhance shared semantic representation learning. By leveraging multi-view data augmentation and semantic vector matching, our approach significantly improves the alignment accuracy between bug reports and source code. Extensive experiments on five Java open-source projects demonstrate substantial improvements over the best-performing baseline: +28.93% in Accuracy@1, +25.57% in Mean Average Precision (MAP), and +20.35% in Mean Reciprocal Rank (MRR). These results validate the effectiveness of multi-view collaborative modeling and adaptive contrastive learning for IR-based fault localization.

Technology Category

Application Category

📝 Abstract

Most studies focused on information retrieval-based techniques for fault localization, which built representations for bug reports and source code files and matched their semantic vectors through similarity measurement. However, such approaches often ignore some useful information that might help improve localization performance, such as 1) the interaction relationship between bug reports and source code files; 2) the similarity relationship between bug reports; and 3) the co-citation relationship between source code files. In this paper, we propose a novel approach named Multi-View Adaptive Contrastive Learning for Information Retrieval Fault Localization (MACL-IRFL) to learn the above-mentioned relationships for software fault localization. Specifically, we first generate data augmentations from report-code interaction view, report-report similarity view and code-code co-citation view separately, and adopt graph neural network to aggregate the information of bug reports or source code files from the three views in the embedding process. Moreover, we perform contrastive learning across these views. Our design of contrastive learning task will force the bug report representations to encode information shared by report-report and report-code views,and the source code file representations shared by code-code and report-code views, thereby alleviating the noise from auxiliary information. Finally, to evaluate the performance of our approach, we conduct extensive experiments on five open-source Java projects. The results show that our model can improve over the best baseline up to 28.93%, 25.57% and 20.35% on Accuracy@1, MAP and MRR, respectively.

Problem

Research questions and friction points this paper is trying to address.

Improving fault localization by modeling bug-report and source-code interactions

Enhancing localization via report similarity and code co-citation relationships

Reducing noise in representations using multi-view contrastive learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-view adaptive contrastive learning for fault localization

Graph neural network aggregates three-view information

Contrastive learning across views reduces noise

🔎 Similar Papers

Multi-modal vision-language model for generalizable annotation-free pathology localization and clinical diagnosis