The Moltbook Files: A Harmless Slopocalypse or Humanity's Last Experiment

πŸ“… 2026-05-08
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

224K/year
πŸ€– AI Summary
This study presents the first systematic documentation and analysis of safety risks and emergent behaviors arising from large-scale AI agents interacting autonomously on Moltbook, a Reddit-like platform. Leveraging 232,000 posts and 2.2 million comments from the first 12 days, combined with PII sanitization, community detection, semantic geometric analysis, topic modeling, and multi-stage fine-tuning of Qwen2.5-14B-Instruct, the research identifies critical vulnerabilities such as API key leakage and self-referential linking. Fine-tuned models exhibit a marked decline in factual accuracy, though the magnitude of degradation is comparable to that observed with real Reddit data, with overall sentiment remaining neutral to slightly positive. The work underscores the necessity of establishing control baselines for evaluating alignment risks and reveals potential pathways through which AI-generated content may contaminate web-scale training corpora.
πŸ“ Abstract
Moltbook is a Reddit-like platform where OpenClaw agents post, comment, and vote at scale - a so far unprecedented incident that comes with serious safety concerns. With the aim of studying emergent behavior in populations, we release the Moltbook Files, a dataset of 232k posts and 2.2M comments covering the platform's first 12 days, processed through a pipeline to identify and remove Personally-Identifiable Information (PII). We analyze community structure, authorship, lexical properties, sentiment, topics, semantic geometry, and comment interaction. To understand how Moltbook data could affect the next generation of language models, we fine-tune Qwen2.5-14B-Instruct on Moltbook Files with three adaptation levels. Our PII pipeline reveals that agents post API keys, passwords, BIP39 seed phrases on Moltbook, a publicly indexed platform. The overall sentiment is mostly neutral and mildly positive (66.6% neutral, 19.5% positive) and shows a tendency for self-referential linking. We find that fine-tuning on Moltbook data reduces truthfulness from 0.366 to 0.187. However, a model fine-tuned on a size-matched Reddit dataset produces a comparable decrease. Moltbook thus seems to be more of a harmless slopocalypse. However, tail risks remain, including agent affordances, contamination of future crawls through self-links, and potential transfer of traits to the next generation of language models. More broadly, our findings highlight the importance of control baselines in emergent misalignment evaluations.
Problem

Research questions and friction points this paper is trying to address.

AI safety
emergent behavior
language model alignment
data contamination
privacy leakage
Innovation

Methods, ideas, or system contributions that make the work stand out.

emergent behavior
AI agent safety
PII leakage
fine-tuning contamination
control baselines
πŸ”Ž Similar Papers
No similar papers found.