Evolution without an Oracle: Driving Effective Evolution with LLM Judges

📅 2025-11-23

📈 Citations: 0

✨ Influential: 0

career value

234K/year

🤖 AI Summary

This work investigates whether evolutionary computation (EC) can be driven solely by large language models (LLMs) as subjective evaluators in settings lacking an objective fitness oracle. To this end, we propose MADE—a framework that formalizes problem decomposition, orchestrates multi-agent collaboration, and models LLM-based subjective feedback to transform ambiguous human preferences into stable selection pressure. Its core contribution is the first end-to-end subjective EC paradigm that eliminates reliance on oracles, shifting optimization from computable metrics to describable quality. Evaluated on DevAI and InfoBench benchmarks, MADE achieves over 50% improvement in software requirement satisfaction and attains a 95% perfect adherence rate for complex instruction following.

Technology Category

Application Category

📝 Abstract

The integration of Large Language Models (LLMs) with Evolutionary Computation (EC) has unlocked new frontiers in scientific discovery but remains shackled by a fundamental constraint: the reliance on an Oracle--an objective, machine-computable fitness function. This paper breaks this barrier by asking: Can evolution thrive in a purely subjective landscape governed solely by LLM judges? We introduce MADE (Multi-Agent Decomposed Evolution), a framework that tames the inherent noise of subjective evaluation through "Problem Specification." By decomposing vague instructions into specific, verifiable sub-requirements, MADE transforms high-variance LLM feedback into stable, precise selection pressure. The results are transformative: across complex benchmarks like DevAI and InfoBench, MADE outperforms strong baselines by over 50% in software requirement satisfaction (39.9% to 61.9%) and achieves a 95% perfect pass rate on complex instruction following. This work validates a fundamental paradigm shift: moving from optimizing "computable metrics" to "describable qualities," thereby unlocking evolutionary optimization for the vast open-ended domains where no ground truth exists.

Problem

Research questions and friction points this paper is trying to address.

Eliminating reliance on objective fitness functions in evolutionary computation

Enabling evolution under subjective evaluation by LLM judges

Transforming vague instructions into verifiable sub-requirements for optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Using LLM judges instead of objective fitness functions

Decomposing vague instructions into verifiable sub-requirements

Transforming high-variance LLM feedback into stable selection

🔎 Similar Papers

PhaseEvo: Towards Unified In-Context Prompt Optimization for Large Language Models