Pushing Forward Pareto Frontiers of Proactive Agents with Behavioral Agentic Optimization

📅 2026-02-11

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

This work addresses the challenge that large language model agents often struggle to balance task performance and user engagement in multi-turn active interactions: passive responses lack adaptability, while excessive reliance on human feedback diminishes user satisfaction. To this end, we propose the Behavior-Agent Optimization (BAO) framework, which uniquely integrates behavior augmentation and behavior regularization. The former enhances the agent’s proactive reasoning and information-seeking capabilities, while the latter suppresses redundant interactions to better align with user expectations. Built upon agentic reinforcement learning (Agentic RL), BAO simultaneously optimizes task efficiency and user experience on the UserRL benchmark, significantly outperforming existing active agent RL baselines and achieving performance comparable to—or even surpassing—that of commercial large language model agents, thereby effectively advancing the Pareto frontier.

Technology Category

Application Category

📝 Abstract

Proactive large language model (LLM) agents aim to actively plan, query, and interact over multiple turns, enabling efficient task completion beyond passive instruction following and making them essential for real-world, user-centric applications. Agentic reinforcement learning (RL) has recently emerged as a promising solution for training such agents in multi-turn settings, allowing interaction strategies to be learned from feedback. However, existing pipelines face a critical challenge in balancing task performance with user engagement, as passive agents can not efficiently adapt to users'intentions while overuse of human feedback reduces their satisfaction. To address this trade-off, we propose BAO, an agentic RL framework that combines behavior enhancement to enrich proactive reasoning and information-gathering capabilities with behavior regularization to suppress inefficient or redundant interactions and align agent behavior with user expectations. We evaluate BAO on multiple tasks from the UserRL benchmark suite, and demonstrate that it substantially outperforms proactive agentic RL baselines while achieving comparable or even superior performance to commercial LLM agents, highlighting its effectiveness for training proactive, user-aligned LLM agents in complex multi-turn scenarios. Our website: https://proactive-agentic-rl.github.io/.

Problem

Research questions and friction points this paper is trying to address.

proactive agents

task performance

user engagement

behavioral trade-off

multi-turn interaction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Behavioral Agentic Optimization

Proactive LLM Agents

Agentic Reinforcement Learning