Do Multi-Agents Dream of Electric Screens? Achieving Perfect Accuracy on AndroidWorld Through Task Decomposition

๐Ÿ“… 2026-02-08
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the limitations of existing single-agent systems on the AndroidWorld benchmark, which struggle with complex tasks due to context contamination, undetected text input failures, and repetitive action loops. To overcome these challenges, we propose Minitap, a novel multi-agent system featuring a six-agent cognitive separation architecture. Minitap decomposes tasks, validates text inputs deterministically, employs metacognitive reasoning, eliminates redundant actions, and incorporates a post-execution verification mechanism coupled with loop detection to trigger adaptive policy switching. Evaluated on all 116 tasks in the AndroidWorld benchmark, Minitap achieves a 100% success rateโ€”surpassing single-agent baselines by 21 percentage points and becoming the first system to exceed human performance, which stands at 80%. The implementation has been publicly released.

Technology Category

Application Category

๐Ÿ“ Abstract
We present Minitap, a multi-agent system that achieves 100% success on the AndroidWorld benchmark, the first to fully solve all 116 tasks and surpassing human performance (80%). We first analyze why single-agent architectures fail: context pollution from mixed reasoning traces, silent text input failures undetected by the agent, and repetitive action loops without escape. Minitap addresses each failure through targeted mechanisms: cognitive separation across six specialized agents, deterministic post-validation of text input against device state, and meta-cognitive reasoning that detects cycles and triggers strategy changes. Ablations show multi-agent decomposition contributes +21 points over single-agent baselines; verified execution adds +7 points; meta-cognition adds +9 points. We release Minitap as open-source software. https://github.com/minitap-ai/mobile-use
Problem

Research questions and friction points this paper is trying to address.

multi-agent systems
Android automation
task decomposition
mobile UI interaction
reasoning failures
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-agent system
task decomposition
meta-cognitive reasoning
deterministic validation
Android automation
๐Ÿ”Ž Similar Papers
No similar papers found.
P
Pierre-Louis Favreau
minitap
J
Jean-Pierre Lo
minitap
C
Clement Guiguet
minitap
C
Charles Simon-Meunier
minitap
N
Nicolas Dehandschoewercker
minitap
A
Allen G. Roush
Thoughtworks
J
Judah Goldfeder
Columbia University
Ravid Shwartz-Ziv
Ravid Shwartz-Ziv
New York University
machine learningdeep learningrepresentation learning theoryneuroscience