360-LLaMA-Factory: Plug&Play Sequence Parallelism for Long Post-Training

📅 2025-05-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address memory constraints and poor scalability in long-sequence post-training of large language models (LLMs), this paper proposes a plug-and-play sequence parallelism (SP) mechanism seamlessly integrated into the LLaMA-Factory framework. Methodologically, we design a lightweight plugin architecture that natively supports multi-mode sequence partitioning—namely, split, interleave, and reduce—while implementing efficient gradient synchronization and dynamic communication scheduling via PyTorch. The approach is fully compatible with the Hugging Face ecosystem and requires no model architecture modifications. Experimental results demonstrate substantial GPU memory reduction, enabling long-sequence post-training for models including Light-R1, TinyR1, and the Kaggle AIMO mathematical reasoning model. The solution has been adopted as a core component in proprietary training frameworks by multiple industry-leading enterprises.

Technology Category

Application Category

📝 Abstract
Adding sequence parallelism into LLaMA-Factory, we open-sourced 360-LLaMA-Factory at https://github.com/Qihoo360/360-LLaMA-Factory. 360-LLaMA-Factory has received wide recognition and used in models such as Light-R1 arXiv:2503.10460, TinyR1 arXiv:2503.04872, Kaggle AIMO math models and also in large companies' training frameworks. This technical report delves deeper into the different sequence parallel modes behind 360-LLaMA-Factory and discusses our implementation insights.
Problem

Research questions and friction points this paper is trying to address.

Enables sequence parallelism in LLaMA-Factory
Open-sources 360-LLaMA-Factory for model training
Explores sequence parallel modes and implementation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Plug & play sequence parallelism
Open-sourced 360-LLaMA-Factory toolkit
Multiple sequence parallel modes
🔎 Similar Papers
No similar papers found.