Language Modeling by Language Models

📅 2025-06-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the feasibility of large language models (LLMs) autonomously discovering novel language model (LM) architectures. To this end, we propose Genesys—a multi-agent collaborative framework integrating genetic programming, adversarial peer review, code auto-generation, generative pretraining, and downstream validation, augmented by a scaling-law-guided, stepwise scale-up strategy. Compared to conventional prompting-based approaches, Genesys improves architecture design success rate by 86 percentage points and enables decomposable, efficient automated model evolution. The system generates 1,162 novel architectures, of which 1,062 successfully complete pretraining validation. The top-performing model surpasses established architectures—including GPT-2 and Mamba2—on 6 of 9 standard benchmarks. To our knowledge, this is the first demonstration of end-to-end, closed-loop, autonomous LM architecture discovery driven entirely by LLMs.

Technology Category

Application Category

📝 Abstract
Can we leverage LLMs to model the process of discovering novel language model (LM) architectures? Inspired by real research, we propose a multi-agent LLM approach that simulates the conventional stages of research, from ideation and literature search (proposal stage) to design implementation (code generation), generative pre-training, and downstream evaluation (verification). Using ideas from scaling laws, our system, Genesys, employs a Ladder of Scales approach; new designs are proposed, adversarially reviewed, implemented, and selectively verified at increasingly larger model scales (14M$sim$350M parameters) with a narrowing budget (the number of models we can train at each scale). To help make discovery efficient and factorizable, Genesys uses a novel genetic programming backbone, which we show has empirical advantages over commonly used direct prompt generation workflows (e.g., $sim$86% percentage point improvement in successful design generation, a key bottleneck). We report experiments involving 1,162 newly discovered designs (1,062 fully verified through pre-training) and find the best designs to be highly competitive with known architectures (e.g., outperform GPT2, Mamba2, etc., on 6/9 common benchmarks). We couple these results with comprehensive system-level ablations and formal results, which give broader insights into the design of effective autonomous discovery systems.
Problem

Research questions and friction points this paper is trying to address.

Leveraging LLMs to discover novel language model architectures
Simulating research stages from ideation to evaluation via multi-agent LLMs
Improving design generation efficiency with genetic programming backbone
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent LLM approach for architecture discovery
Ladder of Scales for selective model verification
Genetic programming backbone for efficient design generation
🔎 Similar Papers
No similar papers found.
J
Junyan Cheng
Allen Institute for AI, Dartmouth College
Peter Clark
Peter Clark
Allen Institute for Artificial Intelligence (AI2)
Artificial Intelligence
K
Kyle Richardson
Allen Institute for AI