🤖 AI Summary
This work investigates the human-like behavioral capabilities of large language model (LLM)-driven autonomous agents—termed “digital players”—in complex strategy games, with emphasis on high-level cognitive tasks including numerical reasoning, multi-step planning, diplomatic negotiation, and deceptive social interaction. To this end, we develop an application-level evaluation platform built upon the open-source strategy game *Unciv*, and propose a data-flywheel evaluation paradigm specifically designed for digital players. We formally define and quantitatively assess LLM performance across three dimensions: long-term cooperation, dynamic strategic博弈, and human-style response generation. The open-source *CivAgent* framework (available on GitHub) enables reproducible benchmarking. Experimental results reveal significant capability gaps in current state-of-the-art LLMs—particularly in sustained cooperative behavior and strategic deception—and identify concrete directions for improvement.
📝 Abstract
With the rapid advancement of Large Language Models (LLMs), LLM-based autonomous agents have shown the potential to function as digital employees, such as digital analysts, teachers, and programmers. In this paper, we develop an application-level testbed based on the open-source strategy game"Unciv", which has millions of active players, to enable researchers to build a"data flywheel"for studying human-like agents in the"digital players"task. This"Civilization"-like game features expansive decision-making spaces along with rich linguistic interactions such as diplomatic negotiations and acts of deception, posing significant challenges for LLM-based agents in terms of numerical reasoning and long-term planning. Another challenge for"digital players"is to generate human-like responses for social interaction, collaboration, and negotiation with human players. The open-source project can be found at https:/github.com/fuxiAIlab/CivAgent.