Multiple Weaks Win Single Strong: Large Language Models Ensemble Weak Reinforcement Learning Agents into a Supreme One

📅 2025-05-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In reinforcement learning (RL), existing ensemble methods—such as voting or static weighting—employ fixed aggregation strategies devoid of semantic understanding of task dynamics, limiting adaptability across diverse environments. To address this, we propose LLM-Ens, the first framework to integrate large language models (LLMs) into RL ensembles. LLM-Ens leverages an LLM to perform semantic state contextualization—partitioning environment states into semantically coherent contexts—and dynamically routes queries to specialized weak agents (e.g., policies trained with different algorithms or random seeds) based on their empirically estimated context-specific performance. This enables task-aware, adaptive agent selection and ensemble composition, overcoming the “semantic blindness” inherent in conventional ensembles. Evaluated on the Atari benchmark, LLM-Ens consistently outperforms state-of-the-art ensemble baselines, achieving up to a 20.9% improvement in mean normalized score. The implementation is open-sourced to ensure full reproducibility.

Technology Category

Application Category

📝 Abstract
Model ensemble is a useful approach in reinforcement learning (RL) for training effective agents. Despite wide success of RL, training effective agents remains difficult due to the multitude of factors requiring careful tuning, such as algorithm selection, hyperparameter settings, and even random seed choices, all of which can significantly influence an agent's performance. Model ensemble helps overcome this challenge by combining multiple weak agents into a single, more powerful one, enhancing overall performance. However, existing ensemble methods, such as majority voting and Boltzmann addition, are designed as fixed strategies and lack a semantic understanding of specific tasks, limiting their adaptability and effectiveness. To address this, we propose LLM-Ens, a novel approach that enhances RL model ensemble with task-specific semantic understandings driven by large language models (LLMs). Given a task, we first design an LLM to categorize states in this task into distinct 'situations', incorporating high-level descriptions of the task conditions. Then, we statistically analyze the strengths and weaknesses of each individual agent to be used in the ensemble in each situation. During the inference time, LLM-Ens dynamically identifies the changing task situation and switches to the agent that performs best in the current situation, ensuring dynamic model selection in the evolving task condition. Our approach is designed to be compatible with agents trained with different random seeds, hyperparameter settings, and various RL algorithms. Extensive experiments on the Atari benchmark show that LLM-Ens significantly improves the RL model ensemble, surpassing well-known baselines by up to 20.9%. For reproducibility, our code is open-source at https://anonymous.4open.science/r/LLM4RLensemble-F7EE.
Problem

Research questions and friction points this paper is trying to address.

Enhancing RL model ensemble with task-specific semantic understanding
Overcoming limitations of fixed ensemble strategies in reinforcement learning
Dynamically selecting best-performing agents based on task situations
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-driven task-specific semantic understanding for ensembles
Dynamic agent switching based on situation analysis
Compatible with diverse RL algorithms and settings
🔎 Similar Papers
No similar papers found.
Y
Yiwen Song
Department of Electronic Engineering, BNRist, Tsinghua University
Qianyue Hao
Qianyue Hao
PhD Student, Department of Electronic Engineering, Tsinghua University
Reinforcement LearningLarge Language Models
Q
Qingmin Liao
Department of Electronic Engineering, BNRist, Tsinghua University
J
Jian Yuan
Department of Electronic Engineering, BNRist, Tsinghua University
Y
Yong Li
Department of Electronic Engineering, BNRist, Tsinghua University