Mutual Enhancement of Large Language and Reinforcement Learning Models through Bi-Directional Feedback Mechanisms: A Case Study

📅 2024-01-12

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

Large language models (LLMs) and reinforcement learning (RL) agents suffer from weak planning capability and low sample efficiency when coupled via unidirectional enhancement—where LLMs merely guide RL without adaptive feedback. Method: We propose a teacher-student collaborative, recursive bidirectional feedback framework: the LLM acts as a high-level planner (“teacher”), while the RL agent (“student”) executes actions and returns real-time environmental signals; these signals dynamically recalibrate LLM token generation and enable RL to leverage linguistic abstractions for improved policy exploration. Contribution/Results: This establishes the first closed-loop paradigm of “I assist you, you assist me, and we co-evolve,” breaking unidirectional augmentation limitations. Evaluated on diverse complex planning tasks, our method improves LLM reasoning accuracy by +12.3% and RL sample efficiency by 3.8×, accelerates convergence of both components, and expands the solvable task boundary.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have demonstrated remarkable capabilities for reinforcement learning (RL) models, such as planning and reasoning capabilities. However, the problems of LLMs and RL model collaboration still need to be solved. In this study, we employ a teacher-student learning framework to tackle these problems, specifically by offering feedback for LLMs using RL models and providing high-level information for RL models with LLMs in a cooperative multi-agent setting. Within this framework, the LLM acts as a teacher, while the RL model acts as a student. The two agents cooperatively assist each other through a process of recursive help, such as"I help you help I help."The LLM agent supplies abstract information to the RL agent, enabling efficient exploration and policy improvement. In turn, the RL agent offers feedback to the LLM agent, providing valuable, real-time information that helps generate more useful tokens. This bi-directional feedback loop promotes optimization, exploration, and mutual improvement for both agents, enabling them to accomplish increasingly challenging tasks. Remarkably, we propose a practical algorithm to address the problem and conduct empirical experiments to evaluate the effectiveness of our method.

Problem

Research questions and friction points this paper is trying to address.

Enhance collaboration between LLMs and RL models

Implement bi-directional feedback for mutual improvement

Develop a teacher-student framework for cooperative tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Teacher-student framework enhances LLM and RL collaboration

Bi-directional feedback loop optimizes mutual agent improvement

Algorithm proposed for practical implementation and evaluation

🔎 Similar Papers

No similar papers found.