Tiny Aya: Bridging Scale and Multilingual Depth

๐Ÿ“… 2026-03-11
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the challenge of achieving efficient and linguistically balanced multilingual understanding and generation under constrained model size. The authors propose a novel multilingual scaling approach centered on efficiency and language balance, training a 3.35B-parameter model covering 70 languages. By leveraging large-scale multilingual pretraining, region-aware instruction tuning, and carefully calibrated data mixing ratios, they release both a globally balanced model and three region-specialized variants. The resulting models achieve state-of-the-art performance across translation quality, multilingual comprehension, and target-language generation tasks, while remaining practical for real-world deployment.

Technology Category

Application Category

๐Ÿ“ Abstract
Tiny Aya redefines what a small multilingual language model can achieve. Trained on 70 languages and refined through region-aware posttraining, it delivers state-of-the-art in translation quality, strong multilingual understanding, and high-quality target-language generation, all with just 3.35B parameters. The release includes a pretrained foundation model, a globally balanced instruction-tuned variant, and three region-specialized models targeting languages from Africa, South Asia, Europe, Asia-Pacific, and West Asia. This report details the training strategy, data composition, and comprehensive evaluation framework behind Tiny Aya, and presents an alternative scaling path for multilingual AI: one centered on efficiency, balanced performance across languages, and practical deployment.
Problem

Research questions and friction points this paper is trying to address.

multilingual language model
small-scale AI
translation quality
balanced multilingual performance
efficient deployment
Innovation

Methods, ideas, or system contributions that make the work stand out.

multilingual language model
region-aware posttraining
parameter efficiency
balanced multilingual performance
small-scale LLM
๐Ÿ”Ž Similar Papers
No similar papers found.
A
Alejandro R. Salamanca
Cohere Labs
D
Diana Abagyan
Cohere
Daniel D'souza
Daniel D'souza
Senior Data Scientist
Ammar Khairi
Ammar Khairi
Research Scholar, Cohere Labs
Statistical Machine LearningNLP
D
David Mora
Cohere
Saurabh Dash
Saurabh Dash
Georgia Institute of Technology
Point ProcessesDynamical SystemsGenerative Models
V
Viraat Aryabumi
Cohere
Sara Rajaee
Sara Rajaee
Ph.D. student at University of Amsterdam
Natural language processingArtificial Intelligence
M
Mehrnaz Mofakhami
Cohere Labs
A
Ananya Sahu
Cohere Labs
T
Thomas Euyang
Cohere Labs
B
Brittawnya Prince
Cohere Labs
M
Madeline Smith
Cohere Labs
Hangyu Lin
Hangyu Lin
Fudan University
Machine LearningComputer Vision
Acyr Locatelli
Acyr Locatelli
Cohere
Machine LearningNeural NetworksGeometryReal Algebraic Geometry
Sara Hooker
Sara Hooker
Head of Cohere For AI
Machine learning efficiencyrobustnessinterpretabilitytrustworthy ML
Tom Kocmi
Tom Kocmi
Cohere
Multilingual EvaluationLLMsMachine Translation
Aidan Gomez
Aidan Gomez
Cohere
Artificial IntelligenceDeep Learning
Ivan Zhang
Ivan Zhang
Cohere
Deep Learning
Phil Blunsom
Phil Blunsom
Cohere & St Hughโ€™s College Oxford
Artificial IntelligenceMachine LearningComputational LinguisticsNatural Language Processing
N
Nick Frosst
Cohere
Joelle Pineau
Joelle Pineau
School of Computer Science, McGill University; FAIR, Meta AI; Mila
Artificial intelligenceMachine learningRobotics
B
Beyza Ermis
Cohere Labs
Ahmet รœstรผn
Ahmet รœstรผn
Cohere For AI
Machine LearningLarge Language Models
Julia Kreutzer
Julia Kreutzer
Senior Research Scientist at Cohere Labs
Multilingual NLPReinforcement Learning for NLPLow-Resource NLP