IntentRL: Training Proactive User-intent Agents for Open-ended Deep Research via Reinforcement Learning

📅 2026-02-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the inefficiency and suboptimal performance of autonomous deep research agents when handling ambiguous queries. To mitigate this, the authors propose an agent endowed with proactive clarification capabilities, which constructs a scalable “shallow-to-deep” intent refinement graph to generate high-quality dialogue data. A two-stage reinforcement learning strategy—comprising offline fine-tuning followed by online interaction with a user simulator—is designed to accurately identify user intent prior to initiating long-horizon research tasks. Experimental results demonstrate that the proposed approach significantly improves intent accuracy and downstream task performance, outperforming both the clarification modules of existing closed-source deep research agents and baseline proactive large language models.

Technology Category

Application Category

📝 Abstract
Deep Research (DR) agents extend Large Language Models (LLMs) beyond parametric knowledge by autonomously retrieving and synthesizing evidence from large web corpora into long-form reports, enabling a long-horizon agentic paradigm. However, unlike real-time conversational assistants, DR is computationally expensive and time-consuming, creating an autonomy-interaction dilemma: high autonomy on ambiguous user queries often leads to prolonged execution with unsatisfactory outcomes. To address this, we propose IntentRL, a framework that trains proactive agents to clarify latent user intents before starting long-horizon research. To overcome the scarcity of open-ended research data, we introduce a scalable pipeline that expands a few seed samples into high-quality dialogue turns via a shallow-to-deep intent refinement graph. We further adopt a two-stage reinforcement learning (RL) strategy: Stage I applies RL on offline dialogues to efficiently learn general user-interaction behavior, while Stage II uses the trained agent and a user simulator for online rollouts to strengthen adaptation to diverse user feedback. Extensive experiments show that IntentRL significantly improves both intent hit rate and downstream task performance, outperforming the built-in clarify modules of closed-source DR agents and proactive LLM baselines.
Problem

Research questions and friction points this paper is trying to address.

Deep Research
user intent
autonomy-interaction dilemma
proactive agents
open-ended research
Innovation

Methods, ideas, or system contributions that make the work stand out.

IntentRL
reinforcement learning
user intent clarification
deep research agents
dialogue data augmentation
🔎 Similar Papers
No similar papers found.
H
Haohao Luo
Sun Yat-sen University
Zexi Li
Zexi Li
Alibaba Group
Deep LearningLarge Language ModelsFederated Learning
Yuexiang Xie
Yuexiang Xie
Alibaba Group
NLPAutoMLFederated Learning
W
Wenhao Zhang
Tongyi Lab, Alibaba Group
Yaliang Li
Yaliang Li
Alibaba Group
Machine Learning
Y
Ying Shen
Sun Yat-sen University