🤖 AI Summary
To address the high barriers to fine-tuning large language models (LLMs) and multimodal LLMs (MLLMs), fragmented toolchains, and inadequate multimodal support, this work introduces the first open-source infrastructure systematically enabling lightweight fine-tuning of multimodal large models. The framework supports over 350 LLMs and 80 MLLMs via a modular architecture, dynamic model registration, unified training interfaces, and quantization-aware fine-tuning. It integrates end-to-end capabilities—including inference (e.g., vLLM, TGI), evaluation, quantization, and deployment. Its key contributions are: (1) the first standardized, production-ready support for MLLM fine-tuning; (2) the broadest model coverage and most comprehensive end-to-end toolchain available; and (3) cross-benchmark reproducible evaluation. Experiments across multiple benchmarks demonstrate substantial reductions in development overhead, while validating efficiency, compatibility, and out-of-the-box usability.
📝 Abstract
Recent development in Large Language Models (LLMs) and Multi-modal Large Language Models (MLLMs) have achieved superior performance and generalization capabilities, covered extensive areas of traditional tasks. However, existing large model training frameworks support only a limited number of models and techniques, particularly lacking in support for new models, which makes fine-tuning LLMs challenging for most developers. Therefore, we develop SWIFT, a customizable one-stop infrastructure for large models. With support of over 350+ LLMs and 80+ MLLMs, SWIFT stands as the open-source framework that provide the most comprehensive support for fine-tuning large models. In particular, it is the first training framework that provides systematic support for MLLMs. Moreover, SWIFT integrates post-training processes such as inference, evaluation, and quantization, to facilitate fast adoptions of large models in various application scenarios, offering helpful utilities like benchmark comparisons among different training techniques.