FlashEvolve: Accelerating Agent Self-Evolution with Asynchronous Stage Orchestration

📅 2026-05-08
📈 Citations: 0
Influential: 0
📄 PDF

career value

221K/year
🤖 AI Summary
This work addresses the high latency in self-evolution processes of large language model (LLM) agents caused by synchronous execution and intra-stage load imbalance. The authors propose an asynchronous stage orchestration mechanism that decouples evolutionary stages via workers and queues, enabling inter-stage overlapping execution. To mitigate data staleness inherent in asynchronous pipelines, they introduce artifact version tracking and a language-space staleness repair strategy, complemented by speculative stage completion and adaptive workflow control. Evaluated on the GEPA benchmark, the approach achieves a 3.5× throughput improvement in a local vLLM environment and a 4.9× gain under API-based serving. Furthermore, it generalizes effectively to ACE and Meta-Harness frameworks, significantly enhancing both token efficiency and system throughput.
📝 Abstract
LLM-based evolution has emerged as a promising way to improve agents by refining non-parametric artifacts, but its wall-clock cost remains a major bottleneck. We identify that this cost comes from synchronized stage execution and imbalance inside each LLM-heavy stage. We present FlashEvolve, an efficient framework that replaces synchronized execution with asynchronous workers and queues, allowing different stages and steps to overlap. To handle data staleness introduced by asynchrony, FlashEvolve tracks artifact versions and applies different policies to update, discard, or patch stale artifacts. Unlike weight-space staleness in asynchronous RL, language-space staleness is inspectable and repairable: a stale artifact is not just delayed work, but readable evidence that the LLM can reflect on, revise, and turn into useful evolution signal. FlashEvolve further improves throughput and token efficiency with speculative stage completion and adaptive workflow control. On GEPA workloads, FlashEvolve improves proposal throughput by $3.5\times$ on local vLLM and $4.9\times$ on API serving over synchronous GEPA. The same design also applies to ACE and Meta-Harness.
Problem

Research questions and friction points this paper is trying to address.

LLM-based evolution
wall-clock cost
synchronized execution
stage imbalance
agent self-evolution
Innovation

Methods, ideas, or system contributions that make the work stand out.

asynchronous orchestration
agent self-evolution
artifact staleness handling
speculative execution
LLM-based evolution
🔎 Similar Papers