A Characterization of List Language Identification in the Limit

📅 2025-11-06

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

This paper investigates language identification in the limit under list-based learning: given a sequence of examples from a target language, a learner outputs, at each step, a list of $k$ candidate languages, with the requirement that the correct language appears in all lists after some finite number of steps. We introduce and characterize *$k$-list identifiability*, proving it is equivalent to decomposing the language family into $k$ disjoint subclasses, each classically identifiable in the limit. Building upon Angluin-style recursive features and the i.i.d. data stream model, we establish a statistical convergence theory: if a language family is $k$-list identifiable, then the identification error decays exponentially; otherwise, no convergence rate tending to zero exists. This work provides the first necessary and sufficient condition for list identification and yields tight convergence bounds, unifying inductive inference and statistical learning perspectives on language identification.

Technology Category

Application Category

📝 Abstract

We study the problem of language identification in the limit, where given a sequence of examples from a target language, the goal of the learner is to output a sequence of guesses for the target language such that all the guesses beyond some finite time are correct. Classical results of Gold showed that language identification in the limit is impossible for essentially any interesting collection of languages. Later, Angluin gave a precise characterization of language collections for which this task is possible. Motivated by recent positive results for the related problem of language generation, we revisit the classic language identification problem in the setting where the learner is given the additional power of producing a list of $k$ guesses at each time step. The goal is to ensure that beyond some finite time, one of the guesses is correct at each time step. We give an exact characterization of collections of languages that can be $k$-list identified in the limit, based on a recursive version of Angluin's characterization (for language identification with a list of size $1$). This further leads to a conceptually appealing characterization: A language collection can be $k$-list identified in the limit if and only if the collection can be decomposed into $k$ collections of languages, each of which can be identified in the limit (with a list of size $1$). We also use our characterization to establish rates for list identification in the statistical setting where the input is drawn as an i.i.d. stream from a distribution supported on some language in the collection. Our results show that if a collection is $k$-list identifiable in the limit, then the collection can be $k$-list identified at an exponential rate, and this is best possible. On the other hand, if a collection is not $k$-list identifiable in the limit, then it cannot be $k$-list identified at any rate that goes to zero.

Problem

Research questions and friction points this paper is trying to address.

Characterizing language collections identifiable with k-list guesses in limit

Extending Angluin's identification framework to allow multiple hypothesis outputs

Establishing identification rates for statistical language learning settings

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses list of k guesses for language identification

Recursively applies Angluin's characterization method

Decomposes language collection into k identifiable subsets

🔎 Similar Papers

No similar papers found.