Policy-to-Language: Train LLMs to Explain Decisions with Flow-Matching Generated Rewards

πŸ“… 2025-02-18
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Addressing the dual challenges of insufficient agent decision interpretability and high reward annotation cost in human–AI coexistence scenarios, this paper proposes a model-agnostic natural language explanation generation framework. Methodologically, it pioneers the deep integration of flow-matching generative models with latent representations of large language models (LLMs), explicitly embedding linguistic explanation cues into the reward modeling process to enable semantically aligned, dense reward auto-generation. Crucially, it operates without human reward annotations and jointly optimizes explanation generation and reinforcement learning objectives end-to-end. Empirically, the approach significantly improves explanation plausibility and faithfulness across diverse RL and LLM benchmarks, while simultaneously enhancing downstream task performance. It demonstrates strong generalization capability and training efficiency, offering a scalable solution for interpretable, annotation-free reward learning.

Technology Category

Application Category

πŸ“ Abstract
As humans increasingly share environments with diverse agents powered by RL, LLMs, and beyond, the ability to explain their policies in natural language will be vital for reliable coexistence. In this paper, we build a model-agnostic explanation generator based on an LLM. The technical novelty is that the rewards for training this LLM are generated by a generative flow matching model. This model has a specially designed structure with a hidden layer merged with an LLM to harness the linguistic cues of explanations into generating appropriate rewards. Experiments on both RL and LLM tasks demonstrate that our method can generate dense and effective rewards while saving on expensive human feedback; it thus enables effective explanations and even improves the accuracy of the decisions in original tasks.
Problem

Research questions and friction points this paper is trying to address.

Develop model-agnostic explanation generator using LLM
Generate rewards via flow matching for training
Enhance decision accuracy and reduce human feedback
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based explanation generator
Flow-matching generated rewards
Model-agnostic decision explanation
πŸ”Ž Similar Papers
X
Xinyi Yang
Department of Electronic Engineering, Tsinghua University, Beijing, China
L
Liang Zeng
Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
H
Heng Dong
Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
C
Chao Yu
Department of Electronic Engineering, Tsinghua University, Beijing, China
X
Xiaoran Wu
Department of Computer Science and Technology, Tsinghua University, Beijing, China
Huazhong Yang
Huazhong Yang
Professor of Electronics Engineering, Tsinghua University
VLSI circuits and systemsmachine intelligencewireless sensor networksbeyond-CMOS computing
Y
Yu Wang
Department of Electronic Engineering, Tsinghua University, Beijing, China
Milind Tambe
Milind Tambe
Professor & Director CRCS Center @Harvard; Director "AI for Social Good" @Google Research
Multiagent SystemsArtificial IntelligenceAI for Social GoodAI for Public HealthAI for Conservation
Tonghan Wang
Tonghan Wang
EconCS group, Harvard University
Multi-Agent LearningComputational EconomicsReinforcement Learning