Reward Evolution with Graph-of-Thoughts: A Bi-Level Language Model Framework for Reinforcement Learning

📅 2025-09-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In reinforcement learning, reward function design relies heavily on human expertise and struggles to generalize across complex, multi-step tasks. To address this, we propose GraphReward: a two-level automated reward evolution framework integrating large language models (LLMs) and vision-language models (VLMs). Its core innovation is the Graph-of-Thoughts mechanism, which formalizes tasks as structured graphs with textual node/edge attributes; VLMs provide unsupervised visual feedback to guide iterative, autonomous reward refinement. The method unifies LLM-based task decomposition, graph neural network–driven structured reasoning, and online RL evaluation—requiring no human intervention. Evaluated on 10 RoboGen and 4 ManiSkill2 benchmarks, GraphReward achieves a 32.25% average success rate improvement on RoboGen and 93.73% on ManiSkill2, significantly outperforming both existing LLM-based approaches and expert-designed rewards.

Technology Category

Application Category

📝 Abstract
Designing effective reward functions remains a major challenge in reinforcement learning (RL), often requiring considerable human expertise and iterative refinement. Recent advances leverage Large Language Models (LLMs) for automated reward design, but these approaches are limited by hallucinations, reliance on human feedback, and challenges with handling complex, multi-step tasks. In this work, we introduce Reward Evolution with Graph-of-Thoughts (RE-GoT), a novel bi-level framework that enhances LLMs with structured graph-based reasoning and integrates Visual Language Models (VLMs) for automated rollout evaluation. RE-GoT first decomposes tasks into text-attributed graphs, enabling comprehensive analysis and reward function generation, and then iteratively refines rewards using visual feedback from VLMs without human intervention. Extensive experiments on 10 RoboGen and 4 ManiSkill2 tasks demonstrate that RE-GoT consistently outperforms existing LLM-based baselines. On RoboGen, our method improves average task success rates by 32.25%, with notable gains on complex multi-step tasks. On ManiSkill2, RE-GoT achieves an average success rate of 93.73% across four diverse manipulation tasks, significantly surpassing prior LLM-based approaches and even exceeding expert-designed rewards. Our results indicate that combining LLMs and VLMs with graph-of-thoughts reasoning provides a scalable and effective solution for autonomous reward evolution in RL.
Problem

Research questions and friction points this paper is trying to address.

Automating reward function design in reinforcement learning
Overcoming LLM limitations like hallucinations and human dependency
Enhancing complex multi-step task performance with autonomous refinement
Innovation

Methods, ideas, or system contributions that make the work stand out.

Graph-of-Thoughts for structured reasoning
Bi-level framework with automated VLM evaluation
Iterative reward refinement without human intervention
🔎 Similar Papers
No similar papers found.
C
Changwei Yao
Carnegie Mellon University
X
Xinzi Liu
University of Tokyo
C
Chen Li
Carnegie Mellon University
Marios Savvides
Marios Savvides
Professor of Electrical and Computer Engineering, Carnegie Mellon University
Pattern RecognitionBiometricsFace RecognitionIris RecognitionTensor