ConlangCrafter: Constructing Languages with a Multi-Hop LLM Pipeline

📅 2025-08-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Constructed language (conlang) design remains heavily reliant on expert linguistic knowledge, hindering accessibility and scalability. Method: We propose the first end-to-end large language model (LLM)-driven framework for automated conlang generation. Our approach employs a multi-stage LLM pipeline that decouples phonology, morphology, syntax, lexicon, and translation modules, augmented by multi-hop metalanguage reasoning, controllable stochasticity injection, and feedback-driven self-refinement—enabling fully autonomous, coherent co-generation without human intervention. Contribution/Results: This work pioneers the systematic exploitation of LLMs’ metalanguage capabilities for full-stack conlang construction, balancing logical consistency with typological diversity. Experiments demonstrate significant improvements over baselines across structural coherence, cross-linguistic typological coverage, and translation fidelity, validating LLMs as effective computational creativity engines for formalized artificial language generation.

Technology Category

Application Category

📝 Abstract
Constructed languages (conlangs) such as Esperanto and Quenya have played diverse roles in art, philosophy, and international communication. Meanwhile, large-scale foundation models have revolutionized creative generation in text, images, and beyond. In this work, we leverage modern LLMs as computational creativity aids for end-to-end conlang creation. We introduce ConlangCrafter, a multi-hop pipeline that decomposes language design into modular stages -- phonology, morphology, syntax, lexicon generation, and translation. At each stage, our method leverages LLMs' meta-linguistic reasoning capabilities, injecting randomness to encourage diversity and leveraging self-refinement feedback to encourage consistency in the emerging language description. We evaluate ConlangCrafter on metrics measuring coherence and typological diversity, demonstrating its ability to produce coherent and varied conlangs without human linguistic expertise.
Problem

Research questions and friction points this paper is trying to address.

Automating constructed language creation without human expertise
Decomposing language design into modular generative stages
Ensuring coherence and diversity in generated languages
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-hop LLM pipeline for modular language design
Leverages meta-linguistic reasoning with randomness injection
Self-refinement feedback ensures linguistic consistency
🔎 Similar Papers
No similar papers found.