Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation

๐Ÿ“… 2025-11-19
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the challenge of jointly optimizing generation quality, inference efficiency, and deployment flexibility for high-resolution image and 10-second video synthesis. We propose a unified family of foundation models trained via a multi-stage framework: large-scale pretraining on 6Bโ€“19B-parameter architectures, followed by semantic-aware data clustering and filtering, self-supervised fine-tuning, and reinforcement learningโ€“based post-training. This enables joint modeling and accelerated inference for both modalities. The model family comprises lightweight and high-fidelity variants, supporting diverse tasks including text-to-image and image-to-video generation. Human evaluations demonstrate significant improvements over state-of-the-art methods in both fidelity and generation speed. To foster reproducibility and practical adoption, we fully open-source the codebase and model checkpoints.

Technology Category

Application Category

๐Ÿ“ Abstract
This report introduces Kandinsky 5.0, a family of state-of-the-art foundation models for high-resolution image and 10-second video synthesis. The framework comprises three core line-up of models: Kandinsky 5.0 Image Lite - a line-up of 6B parameter image generation models, Kandinsky 5.0 Video Lite - a fast and lightweight 2B parameter text-to-video and image-to-video models, and Kandinsky 5.0 Video Pro - 19B parameter models that achieves superior video generation quality. We provide a comprehensive review of the data curation lifecycle - including collection, processing, filtering and clustering - for the multi-stage training pipeline that involves extensive pre-training and incorporates quality-enhancement techniques such as self-supervised fine-tuning (SFT) and reinforcement learning (RL)-based post-training. We also present novel architectural, training, and inference optimizations that enable Kandinsky 5.0 to achieve high generation speeds and state-of-the-art performance across various tasks, as demonstrated by human evaluation. As a large-scale, publicly available generative framework, Kandinsky 5.0 leverages the full potential of its pre-training and subsequent stages to be adapted for a wide range of generative applications. We hope that this report, together with the release of our open-source code and training checkpoints, will substantially advance the development and accessibility of high-quality generative models for the research community.
Problem

Research questions and friction points this paper is trying to address.

Developing high-resolution image and video generation foundation models
Optimizing multi-stage training with data curation and quality enhancement
Enabling scalable generative applications through open-source framework release
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-stage training with pre-training and quality-enhancement techniques
Novel architectural optimizations for high generation speeds
Family of foundation models for image and video synthesis
๐Ÿ”Ž Similar Papers
No similar papers found.
V
V. Arkhipkin
KandinskyLab
Vladimir Korviakov
Vladimir Korviakov
KandinskyLab
N
N. Gerasimenko
KandinskyLab
Denis Parkhomenko
Denis Parkhomenko
KandinskyLab
Viacheslav Vasilev
Viacheslav Vasilev
MIPT
Computer VisionGenerative AIDiffusion models
A
Alexey Letunovskiy
KandinskyLab
Maria Kovaleva
Maria Kovaleva
Sber AI
gen aidiffusiondeep learning
N
Nikolai Vaulin
KandinskyLab
I
Ivan Kirillov
KandinskyLab
L
Lev Novitskiy
KandinskyLab
D
Denis Koposov
KandinskyLab
N
Nikita Kiselev
KandinskyLab
Alexander Varlamov
Alexander Varlamov
Applied AI Institute
WatermarkingDiffusion ModelsSpeech
D
Dmitrii Mikhailov
KandinskyLab
Vladimir Polovnikov
Vladimir Polovnikov
KandinskyLab
A
Andrey Shutkin
KandinskyLab
I
Ilya Vasiliev
KandinskyLab
J
J. Agafonova
KandinskyLab
A
Anastasiia Kargapoltseva
KandinskyLab
A
Anna Dmitrienko
KandinskyLab
A
Anastasia Maltseva
KandinskyLab
A
Anna Averchenkova
KandinskyLab
O
Olga Kim
KandinskyLab
T
T. Nikulina
KandinskyLab
Denis Dimitrov
Denis Dimitrov
Head of Kandinsky Lab
probability theorymathematical statisticsCVNLPGenerative AI