QLASS: Boosting Language Agent Inference via Q-Guided Stepwise Search

📅 2025-02-04

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

Existing language agents struggle with complex interactive tasks due to the absence of intermediate-step annotations, relying solely on sparse outcome-based rewards for global policy optimization—leading to suboptimal decisions. Method: We propose a Q-guided stepwise search framework that constructs reasoning trees and models process-level rewards to implicitly estimate per-step Q-values, thereby generating Q-driven intermediate supervision signals. Crucially, we introduce the first Q-guided stepwise annotation mechanism that requires no explicit intermediate annotations, integrating stepwise search with language model inference optimization. Contribution/Results: Experiments demonstrate that our method achieves strong robustness using only ~50% of annotated data, significantly improving reasoning performance and interpretability on long-horizon decision-making tasks. Qualitative analysis confirms more rational and transparent agent decisions, validating both efficacy and explainability.

Technology Category

Application Category

📝 Abstract

Language agents have become a promising solution to complex interactive tasks. One of the key ingredients to the success of language agents is the reward model on the trajectory of the agentic workflow, which provides valuable guidance during training or inference. However, due to the lack of annotations of intermediate interactions, most existing works use an outcome reward model to optimize policies across entire trajectories. This may lead to sub-optimal policies and hinder the overall performance. To address this, we propose QLASS (Q-guided Language Agent Stepwise Search), to automatically generate annotations by estimating Q-values in a stepwise manner for open language agents. By introducing a reasoning tree and performing process reward modeling, QLASS provides effective intermediate guidance for each step. With the stepwise guidance, we propose a Q-guided generation strategy to enable language agents to better adapt to long-term value, resulting in significant performance improvement during model inference on complex interactive agent tasks. Notably, even with almost half the annotated data, QLASS retains strong performance, demonstrating its efficiency in handling limited supervision. We also empirically demonstrate that QLASS can lead to more effective decision making through qualitative analysis. We will release our code and data.

Problem

Research questions and friction points this paper is trying to address.

Improve language agent inference

Generate intermediate Q-value annotations

Enhance decision-making in interactive tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Q-guided stepwise search

Process reward modeling

Reasoning tree implementation

🔎 Similar Papers

Adaptive In-conversation Team Building for Language Model Agents