Mastering Da Vinci Code: A Comparative Study of Transformer, LLM, and PPO-based Agents

📅 2025-06-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study systematically evaluates three AI paradigms—Transformer architectures, large language models (LLMs: Gemini, DeepSeek, GPT), and Proximal Policy Optimization (PPO) reinforcement learning—for solving *Da Vinci Code*, a board game combining logical reasoning with imperfect information. Methodologically, we propose a novel PPO framework grounded in a Transformer encoder to enable multi-step implicit policy learning, and design structured prompt engineering to enhance LLM reasoning consistency. Results show that the PPO agent achieves a win rate of 58.5% ± 1.0%, significantly outperforming all LLM variants; reveal a fundamental limitation of LLMs in maintaining long-horizon logical consistency; and empirically validate the efficacy of self-play-driven implicit strategy learning for reasoning under hidden information. This work establishes a new modeling paradigm and an empirical benchmark for AI research on logic-based imperfect-information games.

Technology Category

Application Category

📝 Abstract
The Da Vinci Code, a game of logical deduction and imperfect information, presents unique challenges for artificial intelligence, demanding nuanced reasoning beyond simple pattern recognition. This paper investigates the efficacy of various AI paradigms in mastering this game. We develop and evaluate three distinct agent architectures: a Transformer-based baseline model with limited historical context, several Large Language Model (LLM) agents (including Gemini, DeepSeek, and GPT variants) guided by structured prompts, and an agent based on Proximal Policy Optimization (PPO) employing a Transformer encoder for comprehensive game history processing. Performance is benchmarked against the baseline, with the PPO-based agent demonstrating superior win rates ($58.5% pm 1.0%$), significantly outperforming the LLM counterparts. Our analysis highlights the strengths of deep reinforcement learning in policy refinement for complex deductive tasks, particularly in learning implicit strategies from self-play. We also examine the capabilities and inherent limitations of current LLMs in maintaining strict logical consistency and strategic depth over extended gameplay, despite sophisticated prompting. This study contributes to the broader understanding of AI in recreational games involving hidden information and multi-step logical reasoning, offering insights into effective agent design and the comparative advantages of different AI approaches.
Problem

Research questions and friction points this paper is trying to address.

Evaluating AI paradigms for mastering Da Vinci Code game
Comparing Transformer, LLM, and PPO agents' performance
Assessing logical consistency in AI for deductive tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer baseline with limited historical context
LLM agents guided by structured prompts
PPO agent with Transformer encoder history processing
🔎 Similar Papers
No similar papers found.
L
LeCheng Zhang
Westlake College, Westlake University
Y
Yuanshi Wang
Westlake College, Westlake University
Haotian Shen
Haotian Shen
Hybrid Systems Lab, UC Berkeley
control theory
X
Xujie Wang
Westlake College, Westlake University