🤖 AI Summary
Existing LLM evaluation paradigms neglect interactive language acquisition—the core human capacity to autonomously acquire new languages through pattern recognition and real-time feedback.
Method: We propose the first benchmark framework for interactive language learning, built upon real-time conversational feedback. We design Tinkatongue, a novel artificial language, and an embodied robotic interaction environment to enable controlled, grounded language acquisition experiments.
Contribution/Results: While LLM agents fail to achieve functional dialogue within 100 interaction rounds, they exhibit human-like learning behaviors—including tentative induction, hypothesis testing, and adaptive strategy revision—demonstrating emergent inductive reasoning under interaction constraints. This work exposes fundamental limitations of current LLMs in embodied, feedback-driven language acquisition and establishes the first process-oriented evaluation paradigm—shifting focus from static linguistic competence to dynamic learning trajectories. It provides both theoretical foundations and empirical infrastructure for developing next-generation embodied language learning models.
📝 Abstract
Existing evaluation studies on linguistic competence of large language models (LLM agents) have focused primarily on vocabulary learning, morphological rule induction, syntactic generalization, pragmatic inference, and cross-linguistic transfer. However, none assess whether LLM agents can acquire a language through pattern recognition and interactive feedback, a central feature of human language acquisition. We propose a novel experimental framework in which an LLM agent is evaluated on its ability to acquire and use a newly constructed language (Tinkatongue) in conversation with a bot that understands only Tinkatongue. Our findings show that LLM agents fail to establish a conversation within 100 responses, yet they adopt distinct strategies that mirror human approaches to language learning. The results suggest a new direction for evaluation benchmarks and open pathways to model designs that learn more effectively from interactive feedback.