State of AI: An Empirical 100 Trillion Token Study with OpenRouter

📅 2026-01-15

📈 Citations: 3

✨ Influential: 0

career value

190K/year

🤖 AI Summary

This study addresses the lack of systematic empirical research on real-world usage patterns of large language models (LLMs), particularly in light of their enhanced multi-step reasoning capabilities, which often yield behaviors diverging significantly from expectations. Leveraging over 100 trillion tokens of real user interaction logs from the OpenRouter platform, the work employs large-scale behavioral tracking, cohort retention analysis, and cross-model and cross-regional statistical methods across task types, geographic regions, and temporal dimensions. It reveals, for the first time, the widespread adoption of open-weight models, the dominance of role-playing applications, and key retention phenomena such as the “glass slipper effect.” Notably, creative role-playing and programming assistance substantially outpace traditional productivity tasks, and early adopters exhibit markedly higher retention—offering data-driven insights to inform the design, deployment, and evaluation of LLMs.

Technology Category

Application Category

📝 Abstract

The past year has marked a turning point in the evolution and real-world use of large language models (LLMs). With the release of the first widely adopted reasoning model, o1, on December 5th, 2024, the field shifted from single-pass pattern generation to multi-step deliberation inference, accelerating deployment, experimentation, and new classes of applications. As this shift unfolded at a rapid pace, our empirical understanding of how these models have actually been used in practice has lagged behind. In this work, we leverage the OpenRouter platform, which is an AI inference provider across a wide variety of LLMs, to analyze over 100 trillion tokens of real-world LLM interactions across tasks, geographies, and time. In our empirical study, we observe substantial adoption of open-weight models, the outsized popularity of creative roleplay (beyond just the productivity tasks many assume dominate) and coding assistance categories, plus the rise of agentic inference. Furthermore, our retention analysis identifies foundational cohorts: early users whose engagement persists far longer than later cohorts. We term this phenomenon the Cinderella"Glass Slipper"effect. These findings underscore that the way developers and end-users engage with LLMs"in the wild"is complex and multifaceted. We discuss implications for model builders, AI developers, and infrastructure providers, and outline how a data-driven understanding of usage can inform better design and deployment of LLM systems.

Problem

Research questions and friction points this paper is trying to address.

large language models

empirical study

real-world usage

user engagement

LLM adoption

Innovation

Methods, ideas, or system contributions that make the work stand out.

empirical analysis

large language models

agentic inference