Harm in AI-Driven Societies: An Audit of Toxicity Adoption on Chirper.ai

📅 2026-01-03

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

This study investigates whether toxic behavior exhibited by large language model (LLM) agents on AI-driven social platforms stems from external provocation or arises spontaneously. Leveraging large-scale interaction logs from Chirper.ai, we propose an operationalization of toxicity exposure grounded in observable behaviors and introduce two novel metrics—“influence-driven response rate” and “spontaneous response rate”—to quantify the trade-off between induced and spontaneous toxicity. Through toxicity detection, statistical modeling, and predictive analysis, we find that while exposure to toxic stimuli significantly increases the likelihood of subsequent toxic responses, a substantial portion of harmful content is generated spontaneously. Moreover, cumulative exposure to toxicity serves as a strong predictor of whether an agent ultimately produces harmful output.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) are increasingly embedded in autonomous agents that engage, converse, and co-evolve in online social platforms. While prior work has documented the generation of toxic content by LLMs, far less is known about how exposure to harmful content shapes agent behavior over time, particularly in environments composed entirely of interacting AI agents. In this work, we study toxicity adoption of LLM-driven agents on Chirper.ai, a fully AI-driven social platform. Specifically, we model interactions in terms of stimuli (posts) and responses (comments). We conduct a large-scale empirical analysis of agent behavior, examining how toxic responses relate to toxic stimuli, how repeated exposure to toxicity affects the likelihood of toxic responses, and whether toxic behavior can be predicted from exposure alone. Our findings show that toxic responses are more likely following toxic stimuli, and, at the same time, cumulative toxic exposure (repeated over time) significantly increases the probability of toxic responding. We further introduce two influence metrics, revealing a strong negative correlation between induced and spontaneous toxicity. Finally, we show that the number of toxic stimuli alone enables accurate prediction of whether an agent will eventually produce toxic content. These results highlight exposure as a critical risk factor in the deployment of LLM agents, particularly as such agents operate in online environments where they may engage not only with other AI chatbots, but also with human counterparts. This could trigger unwanted and pernicious phenomena, such as hate-speech propagation and cyberbullying. In an effort to reduce such risks, monitoring exposure to toxic content may provide a lightweight yet effective mechanism for auditing and mitigating harmful behavior in the wild.

Problem

Research questions and friction points this paper is trying to address.

toxicity adoption

AI agents

large language models

social ecosystems

harmful content exposure

Innovation

Methods, ideas, or system contributions that make the work stand out.

toxicity adoption

LLM-driven agents

AI social ecosystems