Toxicity in Twitch Chats: An LLM-Based Analysis Across Gaming Communities

📅 2026-05-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the lack of systematic analysis on variations in toxic behavior across different gaming communities on live-streaming platforms. Leveraging approximately 20 million chat messages from Twitch, the authors conduct the first fine-grained toxicity detection across both game genres and individual titles by applying a pretrained large language model in a zero-shot classification setting, aligned with Twitch’s four-category, eight-subcategory toxicity labeling scheme. The model achieves an F1 score of 94.5% on the TextDetox benchmark, demonstrating human-level agreement with manual annotations. Overall toxicity prevalence is found to be 2.4%, with MOBA games exhibiting the highest rate (3.2%) and sports games the lowest (2.0%). The findings reveal that game-specific community norms exert a stronger influence on toxicity than broad genre categories, underscoring the critical role of local community context.
📝 Abstract
Toxicity in online gaming communities remains a persistent challenge, manifesting across genres, platforms, and player interactions. While much research is focused on in-game toxicity, less is known about how toxic behavior varies between gaming communities on streaming platforms. To address this shortcoming, we analyze approximately 20 million chat messages from 4,452 streams, spanning seven game genres on Twitch. We categorize messages according to Twitch's toxicity taxonomy with a pre-trained Large Language Model using zero-shot classification. The taxonomy comprises four categories and eight subclasses, including harassment, discrimination, sexual content, and profanity. Our approach achieves an F1 score of 94.5% on the TextDetox dataset and demonstrates human-model agreement comparable to inter-human agreement. Our analysis reveals that 2.4% of all messages are classified as toxic, with notable differences across genres: streams of MOBA games exhibit the highest relative rate of toxicity (3.2%), and sports games show the lowest rate (2%). Furthermore, results indicate that individual games differ significantly in their toxicity distributions, even within genres, suggesting the existence of game-specific community norms and mechanics that shape toxic behavior beyond genre-level effects. These findings offer empirical insights into genre- and game-specific toxicity patterns on Twitch and can inform more targeted moderation strategies for gaming communities.
Problem

Research questions and friction points this paper is trying to address.

toxicity
gaming communities
Twitch
online harassment
streaming platforms
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Model
zero-shot classification
toxicity detection
online gaming communities
Twitch chat analysis