ALLOY: Generating Reusable Agent Workflows from User Demonstration

📅 2025-10-11

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

Existing LLM prompting techniques struggle to capture users’ procedural preferences (e.g., travel planning, social media posting), and task-specific prompts lack cross-scenario reusability. Method: We propose a programming-by-demonstration (PBD)-based agent workflow construction framework: users demonstrate desired operations via natural interaction; the system automatically synthesizes a visual, editable, structured workflow; and the learned workflow generalizes to semantically similar tasks. Our approach integrates PBD theory, visual workflow modeling, and LLM-driven web automation—marking the first application of the PBD paradigm to LLM agent construction. Contribution/Results: In a 12-participant user study, our method significantly outperformed pure prompting and manual execution in accurately capturing user intent and operational preferences. It demonstrates strong effectiveness, adaptability, and reusability for complex, multi-step web tasks.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) enable end-users to delegate complex tasks to autonomous agents through natural language. However, prompt-based interaction faces critical limitations: Users often struggle to specify procedural requirements for tasks, especially those that don't have a factually correct solution but instead rely on personal preferences, such as posting social media content or planning a trip. Additionally, a ''successful'' prompt for one task may not be reusable or generalizable across similar tasks. We present ALLOY, a system inspired by classical HCI theories on Programming by Demonstration (PBD), but extended to enhance adaptability in creating LLM-based web agents. ALLOY enables users to express procedural preferences through natural demonstrations rather than prompts, while making these procedures transparent and editable through visualized workflows that can be generalized across task variations. In a study with 12 participants, ALLOY's demonstration--based approach outperformed prompt-based agents and manual workflows in capturing user intent and procedural preferences in complex web tasks. Insights from the study also show how demonstration--based interaction complements the traditional prompt-based approach.

Problem

Research questions and friction points this paper is trying to address.

Generating reusable agent workflows from user demonstrations

Overcoming limitations of prompt-based task specification

Enhancing adaptability for LLM-based web agents

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates agent workflows from user demonstrations

Uses visualized workflows for transparency and editability

Enables generalization across task variations through demonstrations

🔎 Similar Papers

System for systematic literature review using multiple AI agents: Concept and an empirical evaluation