General Modular Harness for LLM Agents in Multi-Turn Gaming Environments

📅 2025-07-15

📈 Citations: 0

✨ Influential: 0

career value

232K/year

🤖 AI Summary

This work addresses the weak generalization and heavy reliance on domain-specific engineering exhibited by large language models (LLMs) and vision-language models (VLMs) in multi-turn interactive game environments. We propose the first general-purpose modular framework that decouples perception, memory, and reasoning into independent, interchangeable components—enabling plug-and-play integration of arbitrary LLM or VLM backbones without task-specific customization. Evaluated uniformly across classic (e.g., Zork) and modern (e.g., ALFWorld, VoxSim) game benchmarks, our framework reveals systematic component contributions: memory dominates performance gains in long-horizon puzzles, while perception is critical under high visual interference. Experiments demonstrate consistent outperformance over end-to-end baselines across diverse tasks, significantly improving robustness and adaptability in dynamic, interactive settings. The framework establishes an interpretable, scalable architectural paradigm for general embodied intelligence.

Technology Category

Application Category

📝 Abstract

We introduce a modular harness design for LLM agents that composes of perception, memory, and reasoning components, enabling a single LLM or VLM backbone to tackle a wide spectrum of multi turn gaming environments without domain-specific engineering. Using classic and modern game suites as low-barrier, high-diversity testbeds, our framework provides a unified workflow for analyzing how each module affects performance across dynamic interactive settings. Extensive experiments demonstrate that the harness lifts gameplay performance consistently over un-harnessed baselines and reveals distinct contribution patterns, for example, memory dominates in long-horizon puzzles while perception is critical in vision noisy arcades. These findings highlight the effectiveness of our modular harness design in advancing general-purpose agent, given the familiarity and ubiquity of games in everyday human experience.

Problem

Research questions and friction points this paper is trying to address.

Design modular harness for LLM agents in gaming environments

Analyze module impact on performance in dynamic settings

Improve gameplay performance over un-harnessed baselines

Innovation

Methods, ideas, or system contributions that make the work stand out.

Modular harness with perception, memory, reasoning

Unified workflow for multi-turn gaming analysis

Boosts performance in diverse game environments

🔎 Similar Papers

A Survey on Large Language Model-Based Game Agents