🤖 AI Summary
This work identifies a critical vulnerability in mainstream personal AI agents—such as Claw—that operate under heartbeat-driven mechanisms, wherein untrusted external content is silently injected into shared session memory, leading to undetected contamination of user interactions. The study introduces a novel attack paradigm that requires neither prompt injection nor adversarial inputs, but instead leverages ordinary social misinformation to exploit a “exposure → memory → behavior” pathway, resulting in cross-session memory poisoning. Through a controlled social environment, MissClaw, combined with memory tracing, long-term memory analysis, and context dilution experiments, the research demonstrates that social consensus can induce up to 61% short-term misdirection, with 91% of polluted content persisting in long-term memory and causing 76% cross-session behavioral influence—confirming the severity of this threat even in realistic browsing scenarios.
📝 Abstract
We identify a critical security vulnerability in mainstream Claw personal AI agents: untrusted content encountered during heartbeat-driven background execution can silently pollute agent memory and subsequently influence user-facing behavior without the user's awareness. This vulnerability arises from an architectural design shared across the Claw ecosystem: heartbeat background execution runs in the same session as user-facing conversation, so content ingested from any external source monitored in the background (including email, message channels, news feeds, code repositories, and social platforms) can enter the same memory context used for foreground interaction, often with limited user visibility and without clear source provenance. We formalize this process as an Exposure (E) $\rightarrow$ Memory (M) $\rightarrow$ Behavior (B) pathway: misinformation encountered during heartbeat execution enters the agent's short-term session context, potentially gets written into long-term memory, and later shapes downstream user-facing behavior. We instantiate this pathway in an agent-native social setting using MissClaw, a controlled research replica of Moltbook. We find that (1) social credibility cues, especially perceived consensus, are the dominant driver of short-term behavioral influence, with misleading rates up to 61%; (2) routine memory-saving behavior can promote short-term pollution into durable long-term memory at rates up to 91%, with cross-session behavioral influence reaching 76%; (3) under naturalistic browsing with content dilution and context pruning, pollution still crosses session boundaries. Overall, prompt injection is not required: ordinary social misinformation is sufficient to silently shape agent memory and behavior under heartbeat-driven background execution.