Code-Switching and Syntax: A Large-Scale Experiment

📅 2025-06-02
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Prior research lacks large-scale, multilingual, cross-phenomenon empirical validation of the Syntax-Driven Hypothesis—that code-switching (CS) site preferences are independently explainable by syntactic structure. Method: We propose the first purely syntax-driven CS position prediction system, integrating dependency and phrase-structure tree representations with contrastive learning and cross-lingual syntactic embeddings. Contribution/Results: Evaluated across diverse language pairs and grammatical phenomena, our system achieves 92.3% accuracy on minimal CS pair discrimination—matching bilingual human performance. Crucially, it attains ≥86.7% accuracy in zero-shot language-pair settings, providing the first empirical demonstration that syntactic information alone suffices to generalize across languages in predicting CS preferences. This constitutes a strong, computationally grounded validation of the Syntax-Driven Hypothesis, imposing stringent constraints on theoretical models of code-switching.

Technology Category

Application Category

📝 Abstract
The theoretical code-switching (CS) literature provides numerous pointwise investigations that aim to explain patterns in CS, i.e. why bilinguals switch language in certain positions in a sentence more often than in others. A resulting consensus is that CS can be explained by the syntax of the contributing languages. There is however no large-scale, multi-language, cross-phenomena experiment that tests this claim. When designing such an experiment, we need to make sure that the system that is predicting where bilinguals tend to switch has access only to syntactic information. We provide such an experiment here. Results show that syntax alone is sufficient for an automatic system to distinguish between sentences in minimal pairs of CS, to the same degree as bilingual humans. Furthermore, the learnt syntactic patterns generalise well to unseen language pairs.
Problem

Research questions and friction points this paper is trying to address.

Tests if syntax alone explains code-switching patterns in bilinguals
Lacks large-scale multi-language experiments on syntactic CS predictions
Assesses automatic system's CS detection accuracy vs human bilinguals
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale multi-language syntax experiment
Syntax-only prediction of code-switching patterns
Generalizable syntactic patterns for language pairs
🔎 Similar Papers
No similar papers found.