🤖 AI Summary
Self-evolving multimodal agents often fall into a vicious cycle of ineffective interactions and noisy knowledge due to the coupling of data inefficiency and knowledge interference. This work proposes the Ace-Skill framework, which for the first time jointly designs a prioritized sampling strategy with a lazy-decay proficiency tracking mechanism and incorporates semantic clustering for knowledge organization. These components synergistically optimize interaction sampling and knowledge structure, thereby establishing a virtuous evolutionary cycle. The method achieves substantial performance gains across four multimodal tool-use benchmarks, with a 35.46% relative improvement in Avg@4 accuracy. Notably, the open-sourced 35B MoE model rivals closed-source systems in capability, and its acquired knowledge can be zero-shot transferred to smaller 9B and 4B models.
📝 Abstract
Self-evolving agents present a promising path toward continual adaptation by distilling task interactions into reusable knowledge artifacts. In practice, this paradigm remains hindered by two coupled bottlenecks: data inefficiency, where costly rollout effort is disproportionately spent on low-value samples rather than informative ones, and knowledge interference, where heterogeneous knowledge stored in shared repositories leads to noisy retrieval and task-misaligned guidance. Together, these issues form a self-reinforcing failure loop in which uninformative rollouts yield noisy knowledge, which in turn degrades subsequent rollouts. In this work, we introduce Ace-Skill, a co-evolutionary framework that jointly optimizes rollout allocation and knowledge organization for self-evolving multimodal agents. Specifically, Ace-Skill combines aprioritized sampler with lazy-decay proficiency tracking to focus rollouts on informative and insufficiently mastered samples, and a clustered organizer that semantically clusters knowledge for cleaner retrieval and more reliable adaptation. By improving sampling and organization together, Ace-Skill turns self-evolution into a virtuous cycle in which more informative rollouts produce higher-quality knowledge that supports stronger subsequent rollouts. Across four multimodal tool-use benchmarks, Ace-Skill delivers strong gains (e.g., +35.46% relative improvement in Avg@4 accuracy), enabling an opensource 35B MoE model to match or surpass proprietary models. The acquired knowledge also transfers effectively in a zero-shot manner to smaller 9B and 4B models, allowing resource-constrained agents to inherit advanced capabilities without additional training. The code has been publicly available at https://github.com/AMAP-ML/Ace-Skill.