Guiding Evolution of Artificial Life Using Vision-Language Models

📅 2025-09-26

📈 Citations: 0

✨ Influential: 0

career value

226K/year

🤖 AI Summary

This work addresses the challenge in artificial life (ALife) of achieving open-ended evolution—specifically, autonomous goal generation and sustained complexity growth. We propose ASAL++, the first framework to integrate vision-language models (e.g., Gemma-3) with multimodal foundation models within the Lenia cellular automaton, establishing a dual-model coevolutionary mechanism that dynamically generates and iteratively refines visual evolutionary objectives. ASAL++ supports two objective-evolution strategies: Evolutionary Semantic Targeting (EST), which optimizes for morphological visual novelty, and Evolutionary Temporal Tracing (ETT), which enforces temporal coherence and interpretability of evolutionary trajectories. Neither strategy requires predefined end goals, enabling semantic-aligned, closed-loop search. Experiments demonstrate that EST significantly enhances morphological novelty, while ETT improves trajectory continuity and explainability. Our results validate the feasibility and efficacy of leveraging multimodal large language models to drive open-ended evolution in ALife systems.

Technology Category

Application Category

📝 Abstract

Foundation models (FMs) have recently opened up new frontiers in the field of artificial life (ALife) by providing powerful tools to automate search through ALife simulations. Previous work aligns ALife simulations with natural language target prompts using vision-language models (VLMs). We build on Automated Search for Artificial Life (ASAL) by introducing ASAL++, a method for open-ended-like search guided by multimodal FMs. We use a second FM to propose new evolutionary targets based on a simulation's visual history. This induces an evolutionary trajectory with increasingly complex targets. We explore two strategies: (1) evolving a simulation to match a single new prompt at each iteration (Evolved Supervised Targets: EST) and (2) evolving a simulation to match the entire sequence of generated prompts (Evolved Temporal Targets: ETT). We test our method empirically in the Lenia substrate using Gemma-3 to propose evolutionary targets, and show that EST promotes greater visual novelty, while ETT fosters more coherent and interpretable evolutionary sequences. Our results suggest that ASAL++ points towards new directions for FM-driven ALife discovery with open-ended characteristics.

Problem

Research questions and friction points this paper is trying to address.

Guiding artificial life evolution using vision-language models

Automating search in ALife simulations with multimodal foundation models

Generating increasingly complex evolutionary targets for simulations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision-language models guide artificial life evolution

Automated search with multimodal foundation models

FM proposes new targets from visual history

🔎 Similar Papers

When Large Language Models Meet Evolutionary Algorithms: Potential Enhancements and Challenges

2024-01-19Citations: 14