Avenir-Web: Human-Experience-Imitating Multimodal Web Agents with Mixture of Grounding Experts

📅 2026-02-02

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

Existing autonomous web agents often fail when performing long-horizon tasks on complex, dynamic websites due to inaccurate element grounding, lack of site-specific procedural knowledge, and unstable memory mechanisms. This work proposes a multimodal web agent framework that integrates a Mixture of Grounding Experts for robust element localization, task planning informed by imitation of human operational experience, and a long-term tracking mechanism combining explicit task checklists with adaptive memory. The proposed approach substantially enhances the agent’s robustness and generalization across diverse real-world web environments, achieving a new state-of-the-art among open-source models on the Online-Mind2Web benchmark and matching the performance of leading closed-source systems.

Technology Category

Application Category

📝 Abstract

Despite advances in multimodal large language models, autonomous web agents still struggle to reliably execute long-horizon tasks on complex and dynamic web interfaces. Existing agents often suffer from inaccurate element grounding, the absence of site-specific procedural knowledge, and unstable long-term task tracking and memory, particularly when operating over complex Document Object Model structures. To address these limitations, we introduce Avenir-Web, a web agent that achieves a new open-source state of the art on the Online-Mind2Web benchmark in real-world deployment. Avenir-Web leverages a Mixture of Grounding Experts, Experience-Imitation Planning for incorporating procedural priors, and a task-tracking checklist combined with adaptive memory to enable robust and seamless interaction across diverse user interface paradigms. We evaluate Avenir-Web on Online-Mind2Web, a rigorous benchmark of live and user-centered web tasks. Our results demonstrate that Avenir-Web significantly surpasses prior open-source agents and attains performance parity with top-tier proprietary models, thereby establishing a new open-source state of the art for reliable web agents on live websites.

Problem

Research questions and friction points this paper is trying to address.

web agents

element grounding

long-horizon tasks

procedural knowledge

task tracking

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixture of Grounding Experts

Experience-Imitation Planning

Task-tracking Checklist