360-LLaMA-Factory: Plug&Play Sequence Parallelism for Long Post-Training

📅 2025-05-28

📈 Citations: 0

✨ Influential: 0

career value

242K/year

🤖 AI Summary

To address memory constraints and poor scalability in long-sequence post-training of large language models (LLMs), this paper proposes a plug-and-play sequence parallelism (SP) mechanism seamlessly integrated into the LLaMA-Factory framework. Methodologically, we design a lightweight plugin architecture that natively supports multi-mode sequence partitioning—namely, split, interleave, and reduce—while implementing efficient gradient synchronization and dynamic communication scheduling via PyTorch. The approach is fully compatible with the Hugging Face ecosystem and requires no model architecture modifications. Experimental results demonstrate substantial GPU memory reduction, enabling long-sequence post-training for models including Light-R1, TinyR1, and the Kaggle AIMO mathematical reasoning model. The solution has been adopted as a core component in proprietary training frameworks by multiple industry-leading enterprises.

Technology Category

Application Category

📝 Abstract

Adding sequence parallelism into LLaMA-Factory, we open-sourced 360-LLaMA-Factory at https://github.com/Qihoo360/360-LLaMA-Factory. 360-LLaMA-Factory has received wide recognition and used in models such as Light-R1 arXiv:2503.10460, TinyR1 arXiv:2503.04872, Kaggle AIMO math models and also in large companies' training frameworks. This technical report delves deeper into the different sequence parallel modes behind 360-LLaMA-Factory and discusses our implementation insights.

Problem

Research questions and friction points this paper is trying to address.

Enables sequence parallelism in LLaMA-Factory

Open-sources 360-LLaMA-Factory for model training

Explores sequence parallel modes and implementation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Plug & play sequence parallelism

Open-sourced 360-LLaMA-Factory toolkit

Multiple sequence parallel modes

🔎 Similar Papers

No similar papers found.

TikTok

San Jose, California

LLM Post-training Engineer Intern (Research & Product) - 2026 Summer (BS/MS)

TikTok

San Jose, California

Authors to Follow