ConlangCrafter: Constructing Languages with a Multi-Hop LLM Pipeline

📅 2025-08-08

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

Constructed language (conlang) design remains heavily reliant on expert linguistic knowledge, hindering accessibility and scalability. Method: We propose the first end-to-end large language model (LLM)-driven framework for automated conlang generation. Our approach employs a multi-stage LLM pipeline that decouples phonology, morphology, syntax, lexicon, and translation modules, augmented by multi-hop metalanguage reasoning, controllable stochasticity injection, and feedback-driven self-refinement—enabling fully autonomous, coherent co-generation without human intervention. Contribution/Results: This work pioneers the systematic exploitation of LLMs’ metalanguage capabilities for full-stack conlang construction, balancing logical consistency with typological diversity. Experiments demonstrate significant improvements over baselines across structural coherence, cross-linguistic typological coverage, and translation fidelity, validating LLMs as effective computational creativity engines for formalized artificial language generation.

Technology Category

Application Category

📝 Abstract

Constructed languages (conlangs) such as Esperanto and Quenya have played diverse roles in art, philosophy, and international communication. Meanwhile, large-scale foundation models have revolutionized creative generation in text, images, and beyond. In this work, we leverage modern LLMs as computational creativity aids for end-to-end conlang creation. We introduce ConlangCrafter, a multi-hop pipeline that decomposes language design into modular stages -- phonology, morphology, syntax, lexicon generation, and translation. At each stage, our method leverages LLMs' meta-linguistic reasoning capabilities, injecting randomness to encourage diversity and leveraging self-refinement feedback to encourage consistency in the emerging language description. We evaluate ConlangCrafter on metrics measuring coherence and typological diversity, demonstrating its ability to produce coherent and varied conlangs without human linguistic expertise.

Problem

Research questions and friction points this paper is trying to address.

Automating constructed language creation without human expertise

Decomposing language design into modular generative stages

Ensuring coherence and diversity in generated languages

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-hop LLM pipeline for modular language design

Leverages meta-linguistic reasoning with randomness injection

Self-refinement feedback ensures linguistic consistency

🔎 Similar Papers

No similar papers found.