🤖 AI Summary
This work addresses the limitation of existing conversational GUI agents in effectively leveraging user preferences during multi-step, preference-driven tasks, which often forces users to restart when choices become constrained. To overcome this, the authors propose MAESTRO, a framework that constructs a shared preference memory to extract intensity-aware user preferences from natural language and dynamically adapts the GUI through in-situ modifications—such as augmentation, reordering, filtering, and highlighting—while guiding workflow navigation. MAESTRO’s key innovations include a preference-driven in-situ GUI adaptation mechanism and a conflict-aware backtracking strategy, elevating the agent from a mere executor to an active decision supporter. A user study (N=33) on a movie-booking CAG system demonstrates that MAESTRO significantly outperforms baseline systems, yielding notable improvements in both task efficiency and user experience.
📝 Abstract
Modern task-oriented chatbots present GUI elements alongside natural-language dialogue, yet the agent's role has largely been limited to interpreting natural-language input as GUI actions and following a linear workflow. In preference-driven, multi-step tasks such as booking a flight or reserving a restaurant, earlier choices constrain later options and may force users to restart from scratch. User preferences serve as the key criteria for these decisions, yet existing agents do not systematically leverage them. We present MAESTRO, which extends the agent's role from execution to decision support. MAESTRO maintains a shared preference memory that extracts preferences from natural-language utterances with their strength, and provides two mechanisms. Preference-Grounded GUI Adaptation applies in-place operators (augment, sort, filter, and highlight) to the existing GUI according to preference strength, supporting within-stage comparison. Preference-Guided Workflow Navigation detects conflicts between preferences and available options, proposes backtracking, and records failed paths to avoid revisiting dead ends. We evaluated MAESTRO in a movie-booking Conversational Agent with GUI (CAG) through a within-subjects study with two conditions (Baseline vs. MAESTRO) and two modes (Text vs. Voice), with N = 33 participants.