🤖 AI Summary
To address the suboptimal performance and high deployment costs of large language models (LLMs) for Portuguese—particularly Brazilian Portuguese—this work introduces a vertical-domain specialization paradigm. We develop Sabi’a-3, a flagship model, and its lightweight variant Sabiazinho-3, both trained exclusively on large-scale, Brazil-specific corpora. Leveraging instruction fine-tuning and multi-stage evaluation, the models significantly outperform their predecessor Sabi’a-2 Medium on professional and academic benchmarks. Notably, Sabi’a-3 achieves reasoning capabilities in Brazilian Portuguese on par with state-of-the-art general-purpose multilingual LLMs for the first time. Key contributions include: (1) empirical validation of a domain-data-driven, efficient scaling strategy; (2) competitive performance across major Portuguese-language benchmarks, matching international standards; and (3) a 75% reduction in per-token inference cost—reaching only one-third to one-quarter that of comparable frontier models—demonstrating vertical optimization’s dual advantage in both capability and cost-efficiency.
📝 Abstract
This report presents Sabi'a-3, our new flagship language model, and Sabiazinho-3, a more cost-effective sibling. The models were trained on a large brazilian-centric corpus. Evaluations across diverse professional and academic benchmarks show a strong performance on Portuguese and Brazil-related tasks. Sabi'a-3 shows large improvements in comparison to our previous best of model, Sabia-2 Medium, especially in reasoning-intensive tasks. Notably, Sabi'a-3's average performance matches frontier LLMs, while it is offered at a three to four times lower cost per token, reinforcing the benefits of domain specialization.