Ministral 3

📅 2026-01-13

📈 Citations: 4

✨ Influential: 1

career value

194K/year

🤖 AI Summary

This work addresses the lack of efficient, multifunctional small language models suitable for compute- and memory-constrained environments by introducing the Ministral 3 series—parameter-efficient dense models at 3B, 8B, and 14B scales. Each variant is released in three versions: base pretrained, instruction-tuned, and reasoning-optimized, with support for multimodal image understanding. The core innovation lies in a cascaded distillation approach that integrates iterative pruning, continual knowledge distillation, and multitask continued pretraining, achieving substantial gains in inference efficiency without compromising performance. Evaluated across complex reasoning and general-purpose tasks, the entire model family demonstrates strong empirical results and is released under the Apache 2.0 license.

Technology Category

Application Category

📝 Abstract

We introduce the Ministral 3 series, a family of parameter-efficient dense language models designed for compute and memory constrained applications, available in three model sizes: 3B, 8B, and 14B parameters. For each model size, we release three variants: a pretrained base model for general-purpose use, an instruction finetuned, and a reasoning model for complex problem-solving. In addition, we present our recipe to derive the Ministral 3 models through Cascade Distillation, an iterative pruning and continued training with distillation technique. Each model comes with image understanding capabilities, all under the Apache 2.0 license.

Problem

Research questions and friction points this paper is trying to address.

parameter-efficient

dense language models

compute-constrained

memory-constrained

image understanding

Innovation

Methods, ideas, or system contributions that make the work stand out.

Cascade Distillation

parameter-efficient

dense language models