REALM: A Dataset of Real-World LLM Use Cases

📅 2025-03-24
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the lack of systematic empirical studies on real-world usage patterns of large language models (LLMs). To this end, we introduce REALM—the first large-scale, real-scenario LLM usage dataset, comprising over 94,000 annotated cases sourced from Reddit and news reports. Leveraging multi-stage human annotation and fine-grained taxonomy, we empirically characterize associations among LLM application domains, users’ occupational backgrounds, geographic locations, and demographic attributes. Our analysis reveals statistically significant mappings between professions and application types—for instance, programmers predominantly use LLMs for code generation, while educators focus on pedagogical assistance. These findings establish a reproducible benchmark for assessing societal impact, fairness, and domain-specific adaptation of LLMs. REALM is publicly released and has already been adopted by multiple research groups for downstream tasks.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs), such as the GPT series, have driven significant industrial applications, leading to economic and societal transformations. However, a comprehensive understanding of their real-world applications remains limited. To address this, we introduce REALM, a dataset of over 94,000 LLM use cases collected from Reddit and news articles. REALM captures two key dimensions: the diverse applications of LLMs and the demographics of their users. It categorizes LLM applications and explores how users' occupations relate to the types of applications they use. By integrating real-world data, REALM offers insights into LLM adoption across different domains, providing a foundation for future research on their evolving societal roles.
Problem

Research questions and friction points this paper is trying to address.

Understanding real-world applications of Large Language Models (LLMs).
Exploring diverse LLM uses and user demographics.
Analyzing occupation-based patterns in LLM application adoption.
Innovation

Methods, ideas, or system contributions that make the work stand out.

REALM dataset with 94,000 LLM use cases
Categorizes LLM applications and user demographics
Dashboard for visualizing real-world LLM adoption
🔎 Similar Papers
No similar papers found.