Nested Browser-Use Learning for Agentic Information Seeking

📅 2025-12-29

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

To address the bottleneck of information-seeking agents in accessing deep-web content, this paper proposes a nested browser operation framework that decouples action control from page exploration, enabling fine-grained, realistic web interactions. Unlike conventional API- or URL-level retrieval, the framework defines a minimal yet complete set of browser actions (e.g., click, scroll, form filling) and integrates them within a ReAct-style function-calling architecture, hierarchical state abstraction, and dynamic content summarization. This design facilitates end-to-end controllable browsing. Evaluated on multiple deep-retrieval benchmarks, the method achieves substantial improvements in task completion rate (+18.3%) and answer accuracy (+22.7%). Results demonstrate its robustness, generalizability, and effectiveness in modeling complex, multi-step interactive scenarios—particularly those requiring iterative navigation, dynamic content loading, and structured form submission.

Technology Category

Application Category

📝 Abstract

Information-seeking (IS) agents have achieved strong performance across a range of wide and deep search tasks, yet their tool use remains largely restricted to API-level snippet retrieval and URL-based page fetching, limiting access to the richer information available through real browsing. While full browser interaction could unlock deeper capabilities, its fine-grained control and verbose page content returns introduce substantial complexity for ReAct-style function-calling agents. To bridge this gap, we propose Nested Browser-Use Learning (NestBrowse), which introduces a minimal and complete browser-action framework that decouples interaction control from page exploration through a nested structure. This design simplifies agentic reasoning while enabling effective deep-web information acquisition. Empirical results on challenging deep IS benchmarks demonstrate that NestBrowse offers clear benefits in practice. Further in-depth analyses underscore its efficiency and flexibility.

Problem

Research questions and friction points this paper is trying to address.

Develops a browser-action framework for agents

Simplifies agentic reasoning in web information seeking

Enables effective deep-web information acquisition

Innovation

Methods, ideas, or system contributions that make the work stand out.

Nested structure decouples control from exploration

Minimal browser-action framework simplifies agentic reasoning

Enables effective deep-web information acquisition through browsing

🔎 Similar Papers

NNetNav: Unsupervised Learning of Browser Agents Through Environment Interaction in the Wild