Olmo 3

📅 2025-12-15

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

Existing open-source large language models (LLMs) exhibit limitations in long-context reasoning, structured function calling, and multi-task generalization. Method: Olmo 3 introduces the first fully open, end-to-end reproducible 7B/32B LLM family, employing high-quality multi-task pretraining, Reinforcement Reasoning Alignment (RRA), explicit function-call modeling, and long-context optimization. Crucially, it fully releases all training data, checkpoints, sampled sequences, and software dependencies. Contribution/Results: We release Olmo 3 Think 32B—the strongest fully open “reasoning-first” model to date. It achieves state-of-the-art performance among open models on reasoning (GSM8K, MMLU), coding (HumanEval, MBPP), and instruction-following (AlpacaEval 2.0), and significantly outperforms closed-source baselines including Llama 3-70B. By enabling full transparency and reproducibility, Olmo 3 advances auditable, trustworthy, and scientifically rigorous LLM research.

Technology Category

Application Category

📝 Abstract

We introduce Olmo 3, a family of state-of-the-art, fully-open language models at the 7B and 32B parameter scales. Olmo 3 model construction targets long-context reasoning, function calling, coding, instruction following, general chat, and knowledge recall. This release includes the entire model flow, i.e., the full lifecycle of the family of models, including every stage, checkpoint, data point, and dependency used to build it. Our flagship model, Olmo 3 Think 32B, is the strongest fully-open thinking model released to-date.

Problem

Research questions and friction points this paper is trying to address.

Develops open language models for long-context reasoning

Enables function calling, coding, and instruction following

Provides full lifecycle transparency in model construction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fully-open 7B and 32B parameter language models

Targets long-context reasoning and function calling

Includes entire model lifecycle and all dependencies

🔎 Similar Papers

Are LLMs Good Cryptic Crossword Solvers?