DRACULA: Hunting for the Actions Users Want Deep Research Agents to Execute

📅 2026-04-26

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

Current deep research agents rely solely on final report scores for feedback, making it difficult to optimize intermediate decision-making processes. This work proposes DRACULA, the first dataset incorporating user preferences over intermediate actions, collected through expert-system interactions and comprising 8,103 action preferences and 5,230 execution judgments. We introduce a fine-grained feedback mechanism that reveals how users’ implicit goals influence action selection and propose an online intervention strategy that generates new actions based on historical interactions. Experimental results demonstrate that leveraging the full history of action choices significantly improves large language models’ accuracy in predicting user-preferred actions, and that the proposed strategy is consistently favored by users in subsequent tasks.

Technology Category

Application Category

📝 Abstract

Scientific Deep Research (DR) agents answer user queries by synthesizing research papers into multi-section reports. User feedback can improve their utility, but existing protocols only score the final report, making it hard to study and learn which intermediate actions DR agents should take to improve reports. We collect DRACULA, the first dataset with user feedback on intermediate actions for DR. Over five weeks, nineteen expert CS researchers ask queries to a DR system that proposes actions (e.g., "Add a section on datasets"). Our users select actions they prefer, then judge whether an output report applied their selections successfully, yielding 8,103 action preferences and 5,230 execution judgments. After confirming a DR agent can execute DRACULA's actions, we study the predictability of user-preferred actions via simulation-how well LLMs predict the actions users select-a step toward learning to generate useful actions. We discover: (1) LLM judges initially struggle to predict action selections, but improve most when using a user's full selection history, rather than self-reported or extrapolated user context signals; (2) Users' selections for the same query differ based on unstated goals, bottlenecking simulation and motivating affordances that let users steer reports; and (3) Our simulation results inform an online intervention that generates new actions based on the user's past interactions, which users pick most often in follow-up studies. Overall, while work extensively studies execution, DRACULA reveals a key challenge is deciding which actions to execute in the first place. We open-source DRACULA's study design, user feedback, and simulation tasks to spur future work on action feedback for long-horizon agents.

Problem

Research questions and friction points this paper is trying to address.

Deep Research agents

intermediate actions

user feedback

action selection

long-horizon agents

Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep Research Agents

Action Feedback

User Preference Prediction