LiteCUA: Computer as MCP Server for Computer-Use Agent on AIOS

📅 2025-05-24

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

To bridge the semantic gap between large language models (LLMs) and graphical user interfaces (GUIs), this paper introduces AIOS 1.0—a novel platform that pioneers contextualization as the core paradigm for constructing LLM-native computational environments. We propose and implement the Model Context Protocol (MCP), an original, standardized, lightweight protocol for abstracting system state and enabling action-oriented interaction, thereby decoupling GUI complexity from high-level decision logic. The platform integrates a Python-based system abstraction layer, a lightweight agent architecture, and the OSWorld benchmarking framework. Evaluated on OSWorld, the lightweight CUA agent LiteCUA achieves a 14.66% task success rate—outperforming multiple specialized agent frameworks. All code is publicly released and fully integrated into the AIOS mainline repository.

Technology Category

Application Category

📝 Abstract

We present AIOS 1.0, a novel platform designed to advance computer-use agent (CUA) capabilities through environmental contextualization. While existing approaches primarily focus on building more powerful agent frameworks or enhancing agent models, we identify a fundamental limitation: the semantic disconnect between how language models understand the world and how computer interfaces are structured. AIOS 1.0 addresses this challenge by transforming computers into contextual environments that language models can natively comprehend, implementing a Model Context Protocol (MCP) server architecture to abstract computer states and actions. This approach effectively decouples interface complexity from decision complexity, enabling agents to reason more effectively about computing environments. To demonstrate our platform's effectiveness, we introduce LiteCUA, a lightweight computer-use agent built on AIOS 1.0 that achieves a 14.66% success rate on the OSWorld benchmark, outperforming several specialized agent frameworks despite its simple architecture. Our results suggest that contextualizing computer environments for language models represents a promising direction for developing more capable computer-use agents and advancing toward AI that can interact with digital systems. The source code of LiteCUA is available at https://github.com/agiresearch/LiteCUA, and it is also integrated into the AIOS main branch as part of AIOS at https://github.com/agiresearch/AIOS.

Problem

Research questions and friction points this paper is trying to address.

Bridges semantic gap between language models and computer interfaces

Abstracts computer states and actions via MCP server architecture

Enhances agent reasoning in computing environments through contextualization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transforms computers into contextual environments for models

Implements Model Context Protocol server architecture

Decouples interface complexity from decision complexity

🔎 Similar Papers

Institutional Platform for Secure Self-Service Large Language Model Exploration