Guiding Evolution of Artificial Life Using Vision-Language Models

📅 2025-09-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge in artificial life (ALife) of achieving open-ended evolution—specifically, autonomous goal generation and sustained complexity growth. We propose ASAL++, the first framework to integrate vision-language models (e.g., Gemma-3) with multimodal foundation models within the Lenia cellular automaton, establishing a dual-model coevolutionary mechanism that dynamically generates and iteratively refines visual evolutionary objectives. ASAL++ supports two objective-evolution strategies: Evolutionary Semantic Targeting (EST), which optimizes for morphological visual novelty, and Evolutionary Temporal Tracing (ETT), which enforces temporal coherence and interpretability of evolutionary trajectories. Neither strategy requires predefined end goals, enabling semantic-aligned, closed-loop search. Experiments demonstrate that EST significantly enhances morphological novelty, while ETT improves trajectory continuity and explainability. Our results validate the feasibility and efficacy of leveraging multimodal large language models to drive open-ended evolution in ALife systems.

Technology Category

Application Category

📝 Abstract
Foundation models (FMs) have recently opened up new frontiers in the field of artificial life (ALife) by providing powerful tools to automate search through ALife simulations. Previous work aligns ALife simulations with natural language target prompts using vision-language models (VLMs). We build on Automated Search for Artificial Life (ASAL) by introducing ASAL++, a method for open-ended-like search guided by multimodal FMs. We use a second FM to propose new evolutionary targets based on a simulation's visual history. This induces an evolutionary trajectory with increasingly complex targets. We explore two strategies: (1) evolving a simulation to match a single new prompt at each iteration (Evolved Supervised Targets: EST) and (2) evolving a simulation to match the entire sequence of generated prompts (Evolved Temporal Targets: ETT). We test our method empirically in the Lenia substrate using Gemma-3 to propose evolutionary targets, and show that EST promotes greater visual novelty, while ETT fosters more coherent and interpretable evolutionary sequences. Our results suggest that ASAL++ points towards new directions for FM-driven ALife discovery with open-ended characteristics.
Problem

Research questions and friction points this paper is trying to address.

Guiding artificial life evolution using vision-language models
Automating search in ALife simulations with multimodal foundation models
Generating increasingly complex evolutionary targets for simulations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision-language models guide artificial life evolution
Automated search with multimodal foundation models
FM proposes new targets from visual history
N
Nikhil Baid
University College London, UK
Hannah Erlebach
Hannah Erlebach
DPhil @ FLAIR, Oxford
reinforcement learningcooperative aiai safety
P
Paul Hellegouarch
University College London, UK and Institut Pasteur, Université Paris Cité, CNRS UMR 3571, Decision and Bayesian Computation, Paris, France
F
Frederico Wieser
University College London, UK