ToxiShield: Promoting Inclusive Developer Communication through Real-Time Toxicity Filtering

📅 2026-04-15

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

This work addresses the detrimental impact of toxic communication in code review on collaborative efficiency, a challenge exacerbated by the absence of real-time intervention tools. To bridge this gap, the paper introduces ToxiShield—the first real-time toxicity mitigation system tailored for software engineering contexts—integrating three core components: toxicity detection, fine-grained attribution explanation, and constructive message rewriting. Leveraging BERT, the system achieves 98% accuracy in binary toxicity classification; multi-category toxicity attribution is performed using Claude 3.5 Sonnet, while style-preserving, constructive reformulation is accomplished via a fine-tuned Llama 3.2 model with 95.27% accuracy. Empirical evaluation demonstrates that ToxiShield significantly enhances both inclusivity and practicality in open-source community discourse.

Technology Category

Application Category

📝 Abstract

Toxic interactions during code reviews can undermine teamwork and hinder productivity in software engineering (SE) teams. While prior studies explore toxicity detection and empirical investigation, they lack real-time detoxification tools to support the SE community. To address this gap, we present ToxiShield, a browser extension for GitHub pull requests that is built using three modules: i) Toxicity Filter -- to identify whether a text is toxic, ii) Communication coach -- to facilitate just-in-time fine-grained toxicity categorization with explanations, and iii) The Reframer -- that generates a revised, constructive alternative of a toxic text. For each module, we trained and evaluated multiple deep learning and Large Language Models (LLMs) to identify the best choice. A BERT-based binary detection model, trained on 38,761 code review samples, achieves 98% accuracy and an F1-score of 97% and is the selected one for the Toxicity Filter module. For the Communication Coach, prompt-tuned Claude 3.5 Sonnet achieved the best performance with 39% MCC and 42% F1 in multiclass toxicity classification with detailed reasoning. For Reframer, we evaluated five LLMs using a fine-tuning strategy on a dataset of 10,120 code review comments. The fine-tuned Llama 3.2 model achieves 95.27% style transfer accuracy, 97.03% fluency, 67.07% content preservation, and an 84% J-score. We further validated ToxiShield through a human evaluation using the Technology Acceptance Model with 10 participants, confirming its perceived usefulness and ease of adoption. ToxiShield sets a benchmark for advancing constructive communication in software engineering, driving inclusivity and healthier collaboration in open-source communities.

Problem

Research questions and friction points this paper is trying to address.

toxicity

code review

developer communication

software engineering

real-time filtering

Innovation

Methods, ideas, or system contributions that make the work stand out.

real-time toxicity filtering

constructive reframing

fine-tuned LLMs