🤖 AI Summary
This work addresses language identification for 14 low-resource African languages under extreme constraints—using only the LRE22 development set and prohibiting pre-trained models.
Method: We propose the first fully pre-training-free, data-augmentation-driven multi-classifier fusion framework. It employs audio diversity augmentation—including time-frequency masking, speed perturbation, and additive noise—to extract x-vector embeddings, then fuses SVM and ECAPA-TDNN classifiers. The design prioritizes both low-resource adaptability and edge-deployment efficiency.
Contribution/Results: Evaluated on the LRE22 development set, our framework achieves an EER of 11.43% and Cavg of 0.41—substantially outperforming baseline methods. This demonstrates the effectiveness and practicality of pre-training-free paradigms for ultra-low-resource spoken language identification.
📝 Abstract
This is the detailed system description of the IITKGP-ABSP lab's submission to the NIST language recognition evaluation (LRE) 2022. The objective of this LRE (LRE22) is focused on recognizing 14 low-resourced African languages. Even though NIST has provided additional training and development data, we develop our systems with additional constraints of extreme low-resource. Our primary fixed-set submission ensures the usage of only the LRE 22 development data that contains the utterances of 14 target languages. We further restrict our system from using any pre-trained models for feature extraction or classifier fine-tuning. To address the issue of low-resource, our system relies on diverse audio augmentations followed by classifier fusions. Abiding by all the constraints, the proposed methods achieve an EER of 11.43% and cost metric of 0.41 in the LRE22 development set. For users with limited computational resources or limited storage/network capabilities, the proposed system will help achieve efficient LID performance.