TextArena

📅 2025-04-15

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

Existing benchmarks inadequately assess dynamic social competencies—such as negotiation, theory of mind, and deception—limiting rigorous evaluation of large language models (LLMs) in socio-interactive settings. To address this gap, we introduce TextArena: an open-source, modular, text-only arena for competitive games, comprising 57+ single-, two-, and multi-player environments that enable LLM agents to train and undergo real-time online evaluation in authentic social dilemmas. Our key contributions include: (1) the first systematic framework for evaluating dynamic social interaction; (2) integration of TrueSkill-based dynamic ranking to support real-time human–AI and AI–AI competition; and (3) standardized APIs, pluggable environment interfaces, and an interactive online playground. The entire platform is publicly released under an open license, actively supporting multiple AGI-oriented studies on social reasoning and fostering community-driven development.

Technology Category

Application Category

📝 Abstract

TextArena is an open-source collection of competitive text-based games for training and evaluation of agentic behavior in Large Language Models (LLMs). It spans 57+ unique environments (including single-player, two-player, and multi-player setups) and allows for easy evaluation of model capabilities via an online-play system (against humans and other submitted models) with real-time TrueSkill scores. Traditional benchmarks rarely assess dynamic social skills such as negotiation, theory of mind, and deception, creating a gap that TextArena addresses. Designed with research, community and extensibility in mind, TextArena emphasizes ease of adding new games, adapting the framework, testing models, playing against the models, and training models. Detailed documentation of environments, games, leaderboard, and examples are available on https://github.com/LeonGuertler/TextArena and https://www.textarena.ai/.

Problem

Research questions and friction points this paper is trying to address.

Evaluating dynamic social skills in LLMs

Providing diverse competitive text-based games

Enabling easy model testing and training

Innovation

Methods, ideas, or system contributions that make the work stand out.

Open-source text-based games for LLM training

Online-play system with real-time TrueSkill scoring

Extensible framework for dynamic social skill evaluation

🔎 Similar Papers

No similar papers found.

Scale AI

$264,800—$331,000 USD

San Francisco / New York / Seattle

Research Scientist Intern, Multimodal AI (PhD)