OS-Symphony: A Holistic Framework for Robust and Generalist Computer-Using Agent

📅 2026-01-12
📈 Citations: 2
Influential: 1
📄 PDF
🤖 AI Summary
This work addresses the limited robustness and cross-domain generalization of existing agents in long-horizon tasks, which stem from coarse-grained visual context management and the absence of perception-aware tutorial retrieval mechanisms. To overcome these challenges, we propose OS-Symphony, a framework that orchestrates a reflective memory agent with versatile tool agents to enable trajectory-level self-correction and real-time generation of visually aligned tutorials. Key innovations include a milestone-driven long-term memory mechanism to mitigate visual context loss and a multimodal retriever grounded in the SeeAct paradigm, which synthesizes high-fidelity tutorials within a browser sandbox. Our method achieves new state-of-the-art results across multiple online benchmarks, attaining a 65.84% success rate on OSWorld and consistently outperforming existing approaches across varying model scales.

Technology Category

Application Category

📝 Abstract
While Vision-Language Models (VLMs) have significantly advanced Computer-Using Agents (CUAs), current frameworks struggle with robustness in long-horizon workflows and generalization in novel domains. These limitations stem from a lack of granular control over historical visual context curation and the absence of visual-aware tutorial retrieval. To bridge these gaps, we introduce OS-Symphony, a holistic framework that comprises an Orchestrator coordinating two key innovations for robust automation: (1) a Reflection-Memory Agent that utilizes milestone-driven long-term memory to enable trajectory-level self-correction, effectively mitigating visual context loss in long-horizon tasks; (2) Versatile Tool Agents featuring a Multimodal Searcher that adopts a SeeAct paradigm to navigate a browser-based sandbox to synthesize live, visually aligned tutorials, thereby resolving fidelity issues in unseen scenarios. Experimental results demonstrate that OS-Symphony delivers substantial performance gains across varying model scales, establishing new state-of-the-art results on three online benchmarks, notably achieving 65.84% on OSWorld.
Problem

Research questions and friction points this paper is trying to address.

Computer-Using Agents
robustness
generalization
long-horizon workflows
novel domains
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reflection-Memory Agent
Versatile Tool Agents
SeeAct paradigm
multimodal tutorial retrieval
long-horizon robustness
🔎 Similar Papers
No similar papers found.