Grounded Language Agent for Product Search via Intelligent Web Interactions

📅 2024-04-16
🏛️ CUSTOMNLP4U
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (e.g., GPT-4) incur high inference costs, require intricate prompt engineering, and exhibit limited generalization in web-based agent tasks—particularly e-commerce product search. Method: We propose GLAINTEL, a lightweight trainable language agent built upon Flan-T5, featuring a multimodal action modeling and DOM-aware grounding framework. It introduces a novel training paradigm integrating unsupervised domain adaptation (UDA), behavioral cloning, and PPO-based reinforcement learning. Contribution/Results: GLAINTEL is the first 3B-parameter model to surpass the in-context learning performance of the 540B-parameter GPT-4 on unsupervised web interaction tasks. Empirically, it significantly outperforms LLM-based in-context learning across diverse e-commerce search scenarios; with only a few human demonstrations, its performance matches GPT-4 while reducing inference cost by over two orders of magnitude.

Technology Category

Application Category

📝 Abstract
Recent research has focused on developing agents powered by large language models (LLMs) to accomplish complex high-level user intents. However, employing LLMs with billions of parameters (e.g., GPT-4) may incur substantial costs on top of handcrafting extensive prompts. To address this, we introduce a Grounded Language Agent for Intelligent Web Interactions, named GLAINTEL. GLAINTEL employs Flan-T5 as its backbone and is flexible in training in various settings: unsupervised learning, supervised learning, and unsupervised domain adaptation. Specifically, we tackle both the challenge of learning without human demonstrations and the opportunity to leverage human demonstrations effectively when those are available. Additionally, we explore unsupervised domain adaptation for cases where demonstrations are limited to a specific domain. Experimental evaluations across diverse setups demonstrate the effectiveness of GLAINTEL in unsupervised settings, outperforming in-context learning-based approaches that employ larger models with up to 540 billion parameters. Surprisingly, behavioral cloning-based methods that straightforwardly use human demonstrations do not outperform unsupervised variants of GLAINTEL. Additionally, we show that combining human demonstrations with reinforcement learning-based training yields results comparable to methods utilizing GPT-4. The code is available at: https://github.com/MultifacetedNLP/Web-Agents-Unsupervised
Problem

Research questions and friction points this paper is trying to address.

Intelligent Agents
Large Language Models
Adaptive Learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

GLAINTEL
Unsupervised Learning
Human Demonstrations with Reinforcement Learning
🔎 Similar Papers
No similar papers found.