🤖 AI Summary
Reinforcement Fine-Tuning (RFT) of large language models (LLMs) lacks a general-purpose, flexible, and scalable framework. Method: This paper proposes the first decoupled, three-module unified framework—comprising RFT-Core, an Interaction Engine, and an Optimization Data Pipeline—that enables unified modeling and dynamic switching across synchronous/asynchronous, on-policy/off-policy, and online/offline RFT paradigms. The framework integrates algorithms including PPO, A2C, and DPO, and incorporates distributed training, asynchronous communication, dynamic trajectory sampling and filtering, and RFT-specific data orchestration and caching. Contribution/Results: Experiments demonstrate cross-paradigm compatibility and training stability across diverse LLM tasks; zero-code adaptation to novel scenarios; 37% reduction in inference latency; and a 2.1× improvement in RFT training throughput.
📝 Abstract
Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement fine-tuning (RFT) of large language models. It is built with a decoupled design, consisting of (1) an RFT-core that unifies and generalizes synchronous/asynchronous, on-policy/off-policy, and online/offline modes of RFT, (2) seamless integration for agent-environment interaction with high efficiency and robustness, and (3) systematic data pipelines optimized for RFT. Trinity-RFT can be easily adapted for diverse application scenarios, and serves as a unified platform for exploring advanced reinforcement learning paradigms. This technical report outlines the vision, features, design and implementations of Trinity-RFT, accompanied by extensive examples demonstrating the utility and user-friendliness of the proposed framework.