🤖 AI Summary
Existing open-source large language models (LLMs) exhibit limitations in long-context reasoning, structured function calling, and multi-task generalization. Method: Olmo 3 introduces the first fully open, end-to-end reproducible 7B/32B LLM family, employing high-quality multi-task pretraining, Reinforcement Reasoning Alignment (RRA), explicit function-call modeling, and long-context optimization. Crucially, it fully releases all training data, checkpoints, sampled sequences, and software dependencies. Contribution/Results: We release Olmo 3 Think 32B—the strongest fully open “reasoning-first” model to date. It achieves state-of-the-art performance among open models on reasoning (GSM8K, MMLU), coding (HumanEval, MBPP), and instruction-following (AlpacaEval 2.0), and significantly outperforms closed-source baselines including Llama 3-70B. By enabling full transparency and reproducibility, Olmo 3 advances auditable, trustworthy, and scientifically rigorous LLM research.
📝 Abstract
We introduce Olmo 3, a family of state-of-the-art, fully-open language models at the 7B and 32B parameter scales. Olmo 3 model construction targets long-context reasoning, function calling, coding, instruction following, general chat, and knowledge recall. This release includes the entire model flow, i.e., the full lifecycle of the family of models, including every stage, checkpoint, data point, and dependency used to build it. Our flagship model, Olmo 3 Think 32B, is the strongest fully-open thinking model released to-date.