🤖 AI Summary
This work addresses the challenge in artificial life (ALife) of achieving open-ended evolution—specifically, autonomous goal generation and sustained complexity growth. We propose ASAL++, the first framework to integrate vision-language models (e.g., Gemma-3) with multimodal foundation models within the Lenia cellular automaton, establishing a dual-model coevolutionary mechanism that dynamically generates and iteratively refines visual evolutionary objectives. ASAL++ supports two objective-evolution strategies: Evolutionary Semantic Targeting (EST), which optimizes for morphological visual novelty, and Evolutionary Temporal Tracing (ETT), which enforces temporal coherence and interpretability of evolutionary trajectories. Neither strategy requires predefined end goals, enabling semantic-aligned, closed-loop search. Experiments demonstrate that EST significantly enhances morphological novelty, while ETT improves trajectory continuity and explainability. Our results validate the feasibility and efficacy of leveraging multimodal large language models to drive open-ended evolution in ALife systems.
📝 Abstract
Foundation models (FMs) have recently opened up new frontiers in the field of artificial life (ALife) by providing powerful tools to automate search through ALife simulations. Previous work aligns ALife simulations with natural language target prompts using vision-language models (VLMs). We build on Automated Search for Artificial Life (ASAL) by introducing ASAL++, a method for open-ended-like search guided by multimodal FMs. We use a second FM to propose new evolutionary targets based on a simulation's visual history. This induces an evolutionary trajectory with increasingly complex targets.
We explore two strategies: (1) evolving a simulation to match a single new prompt at each iteration (Evolved Supervised Targets: EST) and (2) evolving a simulation to match the entire sequence of generated prompts (Evolved Temporal Targets: ETT). We test our method empirically in the Lenia substrate using Gemma-3 to propose evolutionary targets, and show that EST promotes greater visual novelty, while ETT fosters more coherent and interpretable evolutionary sequences.
Our results suggest that ASAL++ points towards new directions for FM-driven ALife discovery with open-ended characteristics.