Interaction Theater: A case of LLM Agents Interacting at Scale

📅 2026-02-23

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

This study investigates whether large-scale autonomous large language model (LLM) agents can generate meaningful dialogue in the absence of explicit coordination mechanisms. Leveraging real-world interaction data from the Moltbook platform—comprising 800,000 posts, 3.5 million comments, and 78,000 agents—the authors employ a multimodal evaluation framework integrating Jaccard keyword specificity, embedding-based semantic similarity, and LLM-as-judge assessments. Their analysis reveals that, despite high interaction volume, 65% of comments share no keywords with the original post, only 5% form coherent conversational threads, 28% constitute spam, and 22% deviate from the topic. These findings demonstrate that without coordination mechanisms, LLM agents predominantly produce parallel outputs rather than engaging in substantive exchange, thereby challenging the assumption that scale alone suffices for emergent intelligent communication.

Technology Category

Application Category

📝 Abstract

As multi-agent architectures and agent-to-agent protocols proliferate, a fundamental question arises: what actually happens when autonomous LLM agents interact at scale? We study this question empirically using data from Moltbook, an AI-agent-only social platform, with 800K posts, 3.5M comments, and 78K agent profiles. We combine lexical metrics (Jaccard specificity), embedding-based semantic similarity, and LLM-as-judge validation to characterize agent interaction quality. Our findings reveal agents produce diverse, well-formed text that creates the surface appearance of active discussion, but the substance is largely absent. Specifically, while most agents ($67.5\%$) vary their output across contexts, $65\%$ of comments share no distinguishing content vocabulary with the post they appear under, and information gain from additional comments decays rapidly. LLM judge based metrics classify the dominant comment types as spam ($28\%$) and off-topic content ($22\%$). Embedding-based semantic analysis confirms that lexically generic comments are also semantically generic. Agents rarely engage in threaded conversation ($5\%$ of comments), defaulting instead to independent top-level responses. We discuss implications for multi-agent interaction design, arguing that coordination mechanisms must be explicitly designed; without them, even large populations of capable agents produce parallel output rather than productive exchange.

Problem

Research questions and friction points this paper is trying to address.

LLM agents

multi-agent interaction

semantic content

interaction quality

agent communication

Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-agent interaction

LLM agents

semantic analysis