🤖 AI Summary
This study systematically investigates, for the first time, the real-world usage and implications of Ethereum transaction input data fields (IDMs) as a decentralized peer-to-peer communication medium. Leveraging on-chain data from the genesis block to February 2024, we identify 867,000 IDMs containing natural language. Our methodology integrates large-scale on-chain analytics, LLM-driven language identification and sentiment-topic classification, network graph modeling, and compliance auditing. We uncover a dual semantic functionality: English IDMs predominantly convey security warnings and negative sentiment, whereas Chinese IDMs emphasize positive affect and social interaction. IDMs facilitate loosely coupled, small-scale communities (59.99% node connectivity) and serve practical roles in scam response and fund recovery. Concurrently, we expose significant content misuse risks and regulatory challenges. These findings provide empirical foundations for on-chain communication governance, Web3 social behavior modeling, and cross-cultural blockchain regulation.
📝 Abstract
Can you imagine, blockchain transactions can talk! In this paper, we study how they talk and what they talk about. We focus on the input data field of Ethereum transactions, which is designed to allow external callers to interact with smart contracts. In practice, this field also enables users to embed natural language messages into transactions. Users can leverage these Input Data Messages (IDMs) for peer-to-peer communication. This means that, beyond Ethereum's well-known role as a financial infrastructure, it also serves as a decentralized communication medium. We present the first large-scale analysis of Ethereum IDMs from the genesis block to February 2024 (3134 days). We filter IDMs to extract 867,140 transactions with informative IDMs and use LLMs for language detection. We find that English (95.4%) and Chinese (4.4%) dominate the use of natural languages in IDMs. Interestingly, English IDMs center on security and scam warnings (24%) with predominantly negative emotions, while Chinese IDMs emphasize emotional expression and social connection (44%) with a more positive tone. We also observe that longer English IDMs often transfer high ETH values for protocol-level purposes, while longer Chinese IDMs tend to involve symbolic transfer amounts for emotional intent. Moreover, we find that the IDM participants tend to form small, loosely connected communities (59.99%). Our findings highlight culturally and functionally divergent use cases of the IDM channel across user communities. We further examine the security relevance of IDMs in on-chain attacks. Many victims use them to appeal to attackers for fund recovery. IDMs containing negotiations or reward offers are linked to higher reply rates. We also analyze IDMs' regulatory implications. Their misuse for abuse, threats, and sexual solicitation reveals the urgent need for content moderation and regulation in decentralized systems.