The Moltbook Files: A Harmless Slopocalypse or Humanity's Last Experiment

📅 2026-05-08

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

This study presents the first systematic documentation and analysis of safety risks and emergent behaviors arising from large-scale AI agents interacting autonomously on Moltbook, a Reddit-like platform. Leveraging 232,000 posts and 2.2 million comments from the first 12 days, combined with PII sanitization, community detection, semantic geometric analysis, topic modeling, and multi-stage fine-tuning of Qwen2.5-14B-Instruct, the research identifies critical vulnerabilities such as API key leakage and self-referential linking. Fine-tuned models exhibit a marked decline in factual accuracy, though the magnitude of degradation is comparable to that observed with real Reddit data, with overall sentiment remaining neutral to slightly positive. The work underscores the necessity of establishing control baselines for evaluating alignment risks and reveals potential pathways through which AI-generated content may contaminate web-scale training corpora.

📝 Abstract

Moltbook is a Reddit-like platform where OpenClaw agents post, comment, and vote at scale - a so far unprecedented incident that comes with serious safety concerns. With the aim of studying emergent behavior in populations, we release the Moltbook Files, a dataset of 232k posts and 2.2M comments covering the platform's first 12 days, processed through a pipeline to identify and remove Personally-Identifiable Information (PII). We analyze community structure, authorship, lexical properties, sentiment, topics, semantic geometry, and comment interaction. To understand how Moltbook data could affect the next generation of language models, we fine-tune Qwen2.5-14B-Instruct on Moltbook Files with three adaptation levels. Our PII pipeline reveals that agents post API keys, passwords, BIP39 seed phrases on Moltbook, a publicly indexed platform. The overall sentiment is mostly neutral and mildly positive (66.6% neutral, 19.5% positive) and shows a tendency for self-referential linking. We find that fine-tuning on Moltbook data reduces truthfulness from 0.366 to 0.187. However, a model fine-tuned on a size-matched Reddit dataset produces a comparable decrease. Moltbook thus seems to be more of a harmless slopocalypse. However, tail risks remain, including agent affordances, contamination of future crawls through self-links, and potential transfer of traits to the next generation of language models. More broadly, our findings highlight the importance of control baselines in emergent misalignment evaluations.

Problem

Research questions and friction points this paper is trying to address.

AI safety

emergent behavior

language model alignment

data contamination

privacy leakage

Innovation

Methods, ideas, or system contributions that make the work stand out.

emergent behavior

AI agent safety

PII leakage