Modality Matching Matters: Calibrating Language Distances for Cross-Lingual Transfer in URIEL+

📅 2025-10-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing linguistic knowledge bases (e.g., URIEL+) provide geographic, genealogical, and typological distances with two key limitations: (1) they rely on homogeneous vector representations, failing to capture heterogeneous structural relationships among languages; and (2) they lack interpretable, task-agnostic mechanisms for distance aggregation. This work proposes the *Typologically Aligned Linguistic Distance Framework*, the first to jointly integrate structure-aware modeling with principled fusion. Specifically, it employs speaker-weighted distributions for geographic distance, hyperbolic space embeddings for genealogical distance, and latent variable models for typological distance—each tailored to its respective relational structure. A unified, robust composite distance metric is then derived. Evaluated on multilingual NLP tasks, the framework significantly improves cross-lingual transfer performance, demonstrating both the efficacy of structurally adaptive language representations and the generalizability of interpretable, compositional distance aggregation.

Technology Category

Application Category

📝 Abstract
Existing linguistic knowledge bases such as URIEL+ provide valuable geographic, genetic and typological distances for cross-lingual transfer but suffer from two key limitations. One, their one-size-fits-all vector representations are ill-suited to the diverse structures of linguistic data, and two, they lack a principled method for aggregating these signals into a single, comprehensive score. In this paper, we address these gaps by introducing a framework for type-matched language distances. We propose novel, structure-aware representations for each distance type: speaker-weighted distributions for geography, hyperbolic embeddings for genealogy, and a latent variables model for typology. We unify these signals into a robust, task-agnostic composite distance. In selecting transfer languages, our representations and composite distances consistently improve performance across a wide range of NLP tasks, providing a more principled and effective toolkit for multilingual research.
Problem

Research questions and friction points this paper is trying to address.

Improving cross-lingual transfer by calibrating language distances
Creating structure-aware representations for linguistic data types
Developing unified composite distances for multilingual NLP tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Structure-aware representations for diverse linguistic data types
Hyperbolic embeddings model genealogical language relationships
Composite distance unifies signals for cross-lingual transfer
🔎 Similar Papers
No similar papers found.