🤖 AI Summary
Constructed language (conlang) design remains heavily reliant on expert linguistic knowledge, hindering accessibility and scalability. Method: We propose the first end-to-end large language model (LLM)-driven framework for automated conlang generation. Our approach employs a multi-stage LLM pipeline that decouples phonology, morphology, syntax, lexicon, and translation modules, augmented by multi-hop metalanguage reasoning, controllable stochasticity injection, and feedback-driven self-refinement—enabling fully autonomous, coherent co-generation without human intervention. Contribution/Results: This work pioneers the systematic exploitation of LLMs’ metalanguage capabilities for full-stack conlang construction, balancing logical consistency with typological diversity. Experiments demonstrate significant improvements over baselines across structural coherence, cross-linguistic typological coverage, and translation fidelity, validating LLMs as effective computational creativity engines for formalized artificial language generation.
📝 Abstract
Constructed languages (conlangs) such as Esperanto and Quenya have played diverse roles in art, philosophy, and international communication. Meanwhile, large-scale foundation models have revolutionized creative generation in text, images, and beyond. In this work, we leverage modern LLMs as computational creativity aids for end-to-end conlang creation. We introduce ConlangCrafter, a multi-hop pipeline that decomposes language design into modular stages -- phonology, morphology, syntax, lexicon generation, and translation. At each stage, our method leverages LLMs' meta-linguistic reasoning capabilities, injecting randomness to encourage diversity and leveraging self-refinement feedback to encourage consistency in the emerging language description. We evaluate ConlangCrafter on metrics measuring coherence and typological diversity, demonstrating its ability to produce coherent and varied conlangs without human linguistic expertise.