DeTox-Fed: Detecting Toxic Conversations in the Fediverse with Federated Graph Neural Networks

📅 2026-05-20

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

This work addresses the challenge of detecting toxic content in decentralized social networks such as the Fediverse, where data silos across instances, heterogeneous moderation policies, and restricted conversation visibility hinder effective toxicity identification. The paper proposes the first federated graph learning framework that enables each instance to locally construct conversation graphs and collaboratively train a graph neural network without sharing raw data or labels. By integrating dialogue structure, user interactions, statistical features, and aggregated sentiment signals, the approach balances privacy preservation with detection performance. Experiments on a large-scale Pleroma dataset demonstrate that the framework robustly and efficiently identifies toxic conversations, even under practical constraints including scarce local labels, partial client participation, and dynamically shifting moderation thresholds.

📝 Abstract

The rise of decentralized social networks (DSNs), and in particular the rapid uptake of the Fediverse (e.g., Pleroma, Mastodon, Lemygrad), introduces new challenges in content moderation. Independent instances host their own data, follow different moderation policies, and often observe only partial views of conversations. We present DeTox-Fed, a federated graph-learning framework for detecting toxic conversations in DSNs without requiring instances to share raw conversations or moderation labels. Each instance constructs a local conversation graph, where nodes represent conversation trees and edges capture shared user participation across conversations. A Graph Neural Network (GNN) is then trained in a federated learning setup, allowing instances to collaboratively learn a toxicity classifier while preserving data locality. Unlike text-only moderation approaches, DeTox-Fed combines conversational structure, user-interaction patterns, conversation-level statistics, and aggregate sentiment signals. We evaluate the framework on a large Pleroma conversation dataset and show that it achieves stable toxic conversation detection under limited local labels, partial client participation, and varying moderation thresholds. Our results indicate that federated graph-based moderation is a promising direction for semi-automated moderation in decentralized social networks.

Problem

Research questions and friction points this paper is trying to address.

decentralized social networks

toxic conversation detection

content moderation

Fediverse

data locality

Innovation

Methods, ideas, or system contributions that make the work stand out.

Federated Learning

Graph Neural Networks

Decentralized Social Networks