Think in Games: Learning to Reason in Games via Reinforcement Learning with Large Language Models

📅 2025-08-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) excel at declarative knowledge reasoning but struggle to execute procedural behaviors effectively in interactive tasks, revealing a fundamental “knowing–doing” gap. To bridge this gap, we propose LangRL—a novel paradigm that integrates reinforcement learning (RL) policy optimization directly into the language modeling process. LangRL unifies reasoning and action through language-guided action generation and environment feedback-driven online iterative optimization. Its core innovation lies in explicitly formulating RL decision-making as a conditional language generation task, thereby ensuring both interpretability and sample efficiency. Evaluated across diverse interactive benchmarks, LangRL achieves performance on par with conventional RL methods while generating natural-language explanations for all decisions. Moreover, it significantly reduces training data requirements and computational overhead compared to standard RL approaches.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) excel at complex reasoning tasks such as mathematics and coding, yet they frequently struggle with simple interactive tasks that young children perform effortlessly. This discrepancy highlights a critical gap between declarative knowledge (knowing about something) and procedural knowledge (knowing how to do something). Although traditional reinforcement learning (RL) agents can acquire procedural knowledge through environmental interaction, they often operate as black boxes and require substantial training data. In contrast, LLMs possess extensive world knowledge and reasoning capabilities, but are unable to effectively convert this static knowledge into dynamic decision-making in interactive settings. To address this challenge, we propose Think in Games (TiG), a novel framework that empowers LLMs to develop procedural understanding through direct interaction with game environments, while retaining their inherent reasoning and explanatory abilities. Specifically, TiG reformulates RL-based decision-making as a language modeling task: LLMs generate language-guided policies, which are refined iteratively through online reinforcement learning based on environmental feedback. Our experimental results show that TiG successfully bridges the gap between declarative and procedural knowledge, achieving competitive performance with dramatically lower data and computational demands compared to conventional RL methods. Moreover, TiG provides step-by-step natural language explanations for its decisions, greatly improving transparency and interpretability in complex interactive tasks.
Problem

Research questions and friction points this paper is trying to address.

Bridging declarative and procedural knowledge gap in LLMs
Enabling LLMs to make dynamic decisions in games
Combining reinforcement learning with language modeling for reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Language-guided policies via reinforcement learning
Iterative refinement through environmental feedback
Bridging declarative and procedural knowledge gap
🔎 Similar Papers
No similar papers found.
Y
Yi Liao
Tencent
Y
Yu Gu
Tencent
Yuan Sui
Yuan Sui
PhD student, National University of Singapore
Natural Language ProcessingGraphs
Zining Zhu
Zining Zhu
Stevens Institute of Technology
Natural Language ProcessingExplainable AI
Y
Yifan Lu
Tencent
G
Guohua Tang
Tencent
Z
Zhongqian Sun
Tencent
W
Wei Yang
Tencent