🤖 AI Summary
Existing ASR resources are heavily skewed toward Brazilian Portuguese, leaving European Portuguese (EP) and other Portuguese variants severely under-resourced in terms of high-quality open-source data and models. To address this gap, we introduce CAMÕES—the first open-source ASR framework specifically designed for EP and related Portuguese variants. Our contributions are threefold: (1) We construct the first EP-specific, 46-hour, multi-domain evaluation benchmark; (2) We release an open-source model suite trained on 425 hours of annotated speech, supporting both zero-shot and fine-tuned inference, and introduce E-Branchformer—the first EP-tailored architecture based on the Branchformer topology; (3) Our best-performing model achieves over 35% relative WER reduction on the EP test set compared to the strongest zero-shot baseline, substantially advancing recognition accuracy. CAMÕES fills a critical resource and technical void in low-resource ASR for Portuguese dialects.
📝 Abstract
Existing resources for Automatic Speech Recognition in Portuguese are mostly focused on Brazilian Portuguese, leaving European Portuguese (EP) and other varieties under-explored. To bridge this gap, we introduce CAMÕES, the first open framework for EP and other Portuguese varieties. It consists of (1) a comprehensive evaluation benchmark, including 46h of EP test data spanning multiple domains; and (2) a collection of state-of-the-art models. For the latter, we consider multiple foundation models, evaluating their zero-shot and fine-tuned performances, as well as E-Branchformer models trained from scratch. A curated set of 425h of EP was used for both fine-tuning and training. Our results show comparable performance for EP between fine-tuned foundation models and the E-Branchformer. Furthermore, the best-performing models achieve relative improvements above 35% WER, compared to the strongest zero-shot foundation model, establishing a new state-of-the-art for EP and other varieties.