🤖 AI Summary
This study investigates whether sustained exposure to low-quality web text induces persistent cognitive degradation—termed “LLM Brain Rot”—in large language models (LLMs). Method: Using real-world Twitter/X data, we construct high-quality and low-quality (i.e., “garbage”) datasets, operationalizing data quality along two dimensions: user engagement (M1) and semantic coherence (M2). Causal inference is conducted via Hedges’ *g* effect-size analysis and counterfactual error attribution. Contribution/Results: Continuous pretraining on garbage data causes significant, dose-dependent, and irreversible degradation across reasoning, long-context understanding, and safety performance in four LLMs (*g* > 0.3). “Thought-jumping”—a breakdown in logical continuity—is identified as the core mechanistic impairment. This work establishes data curation as a critical safety intervention during training and provides the first causal evidence linking data quality to cognitive robustness in LLMs.
📝 Abstract
We propose and test the LLM Brain Rot Hypothesis: continual exposure to junk web text induces lasting cognitive decline in large language models (LLMs). To causally isolate data quality, we run controlled experiments on real Twitter/X corpora, constructing junk and reversely controlled datasets via two orthogonal operationalizations: M1 (engagement degree) and M2 (semantic quality), with matched token scale and training operations across conditions. Contrary to the control group, continual pre-training of 4 LLMs on the junk dataset causes non-trivial declines (Hedges' $g>0.3$) on reasoning, long-context understanding, safety, and inflating "dark traits" (e.g., psychopathy, narcissism). The gradual mixtures of junk and control datasets also yield dose-response cognition decay: for example, under M1, ARC-Challenge with Chain Of Thoughts drops $74.9
ightarrow 57.2$ and RULER-CWE $84.4
ightarrow 52.3$ as junk ratio rises from $0%$ to $100%$.
Error forensics reveal several key insights. First, we identify thought-skipping as the primary lesion: models increasingly truncate or skip reasoning chains, explaining most of the error growth. Second, partial but incomplete healing is observed: scaling instruction tuning and clean data pre-training improve the declined cognition yet cannot restore baseline capability, suggesting persistent representational drift rather than format mismatch. Finally, we discover that the popularity, a non-semantic metric, of a tweet is a better indicator of the Brain Rot effect than the length in M1. Together, the results provide significant, multi-perspective evidence that data quality is a causal driver of LLM capability decay, reframing curation for continual pretraining as a extit{training-time safety} problem and motivating routine "cognitive health checks" for deployed LLMs.