Learning to Play Blackjack: A Curriculum Learning Perspective

📅 2026-03-31

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

This work addresses the challenges of low training efficiency and suboptimal performance commonly faced by reinforcement learning agents in high-dimensional action spaces. It proposes, for the first time, a method that leverages large language models to dynamically generate action-level curricula, constructing multi-stage training trajectories for both Tabular Q-Learning and Deep Q-Network (DQN) agents in the game of Blackjack. By progressively introducing more complex actions, the approach integrates large language models, curriculum learning, and deep reinforcement learning to enhance learning efficacy. Evaluated in an eight-deck Blackjack environment, the method significantly improves agent performance: the DQN agent’s win rate increases from 43.97% to 47.41%, its bust rate decreases from 32.9% to 28.0%, and training converges over 74% faster—requiring less total training time than the evaluation phase of baseline methods.

Technology Category

Application Category

📝 Abstract

Reinforcement Learning (RL) agents often struggle with efficiency and performance in complex environments. We propose a novel framework that uses a Large Language Model (LLM) to dynamically generate a curriculum over available actions, enabling the agent to incorporate each action individually. We apply this framework to the game of Blackjack, where the LLM creates a multi-stage training path that progressively introduces complex actions to a Tabular Q-Learning and a Deep Q-Network (DQN) agent. Our evaluation in a realistic 8-deck simulation over 10 independent runs demonstrates significant performance gains over standard training methods. The curriculum-based approach increases the DQN agent's average win rate from 43.97% to 47.41%, reduces the average bust rate from 32.9% to 28.0%, and accelerates the overall workflow by over 74%, with the agent's full training completing faster than the baseline's evaluation phase alone. These results validate that LLM-guided curricula can build more effective, robust, and efficient RL agents.

Problem

Research questions and friction points this paper is trying to address.

Reinforcement Learning

Curriculum Learning

Efficiency

Performance

Complex Environments

Innovation

Methods, ideas, or system contributions that make the work stand out.

Curriculum Learning

Large Language Model

Reinforcement Learning