🤖 AI Summary
LLM-based agents suffer from suboptimal performance due to poorly engineered prompts, ambiguous tool descriptions, and misconfigured parameters; existing optimization methods are either overly complex or neglect inter-component dependencies. This paper introduces ARTEMIS—the first end-to-end, code-free, semantics-driven framework for joint agent configuration optimization. It employs semantic-aware genetic operators to automatically evolve complete agent configurations—including prompts, tool schemas, and parameters—without architectural modifications and with full compatibility across commercial and open-source LLMs. Its core innovation lies in a multimodal evolutionary paradigm integrating log semantic parsing, automatic component discovery, and execution-signal extraction to enable cross-component co-optimization. Evaluated on four representative tasks, ARTEMIS achieves +13.6% acceptance rate, +10.1% overall performance gain, +22% improvement in mathematical accuracy, and −36.9% reduction in inference token consumption.
📝 Abstract
Agentic AI systems built on large language models (LLMs) offer significant potential for automating complex workflows, from software development to customer support. However, LLM agents often underperform due to suboptimal configurations; poorly tuned prompts, tool descriptions, and parameters that typically require weeks of manual refinement. Existing optimization methods either are too complex for general use or treat components in isolation, missing critical interdependencies.
We present ARTEMIS, a no-code evolutionary optimization platform that jointly optimizes agent configurations through semantically-aware genetic operators. Given only a benchmark script and natural language goals, ARTEMIS automatically discovers configurable components, extracts performance signals from execution logs, and evolves configurations without requiring architectural modifications.
We evaluate ARTEMIS on four representative agent systems: the emph{ALE Agent} for competitive programming on AtCoder Heuristic Contest, achieving a extbf{$13.6%$ improvement} in acceptance rate; the emph{Mini-SWE Agent} for code optimization on SWE-Perf, with a statistically significant extbf{10.1% performance gain}; and the emph{CrewAI Agent} for cost and mathematical reasoning on Math Odyssey, achieving a statistically significant extbf{$36.9%$ reduction} in the number of tokens required for evaluation. We also evaluate the emph{MathTales-Teacher Agent} powered by a smaller open-source model (Qwen2.5-7B) on GSM8K primary-level mathematics problems, achieving a extbf{22% accuracy improvement} and demonstrating that ARTEMIS can optimize agents based on both commercial and local models.