RedOne 2.0: Rethinking Domain-specific LLM Post-Training in Social Networking Services

📅 2025-11-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Social network services (SNS) exhibit dynamic multilinguality, high heterogeneity, and strong distributional shift—posing a fundamental trade-off between in-distribution performance and out-of-distribution robustness during supervised fine-tuning (SFT) of large language models (LLMs), especially for smaller-scale models. To address this, we propose a three-stage, reinforcement learning (RL)-dominant progressive post-training framework: (i) exploratory learning on curated SNS corpora; (ii) selective SFT combined with a mixed general-domain data rehearsal mechanism to mitigate catastrophic forgetting; and (iii) RL-based refinement guided by SNS-centric signals. This paradigm pioneers RL-first, stage-wise balanced optimization. On a 4B-parameter model, it achieves +2.41 points over a 7B baseline and +8.74 points over the base model, surpassing prior methods using <50% of their training data—demonstrating substantial gains in data efficiency, training stability, and cross-lingual robustness for small LLMs.

Technology Category

Application Category

📝 Abstract
As a key medium for human interaction and information exchange, social networking services (SNS) pose unique challenges for large language models (LLMs): heterogeneous workloads, fast-shifting norms and slang, and multilingual, culturally diverse corpora that induce sharp distribution shift. Supervised fine-tuning (SFT) can specialize models but often triggers a ``seesaw''between in-distribution gains and out-of-distribution robustness, especially for smaller models. To address these challenges, we introduce RedOne 2.0, an SNS-oriented LLM trained with a progressive, RL-prioritized post-training paradigm designed for rapid and stable adaptation. The pipeline consist in three stages: (1) Exploratory Learning on curated SNS corpora to establish initial alignment and identify systematic weaknesses; (2) Targeted Fine-Tuning that selectively applies SFT to the diagnosed gaps while mixing a small fraction of general data to mitigate forgetting; and (3) Refinement Learning that re-applies RL with SNS-centric signals to consolidate improvements and harmonize trade-offs across tasks. Across various tasks spanning three categories, our 4B scale model delivers an average improvements about 2.41 over the 7B sub-optimal baseline. Additionally, RedOne 2.0 achieves average performance lift about 8.74 from the base model with less than half the data required by SFT-centric method RedOne, evidencing superior data efficiency and stability at compact scales. Overall, RedOne 2.0 establishes a competitive, cost-effective baseline for domain-specific LLMs in SNS scenario, advancing capability without sacrificing robustness.
Problem

Research questions and friction points this paper is trying to address.

Addressing heterogeneous workloads in social networking services
Overcoming fast-shifting norms and slang in SNS environments
Managing multilingual culturally diverse corpora causing distribution shift
Innovation

Methods, ideas, or system contributions that make the work stand out.

Progressive RL-prioritized post-training for rapid adaptation
Exploratory Learning on SNS corpora to identify weaknesses
Targeted Fine-Tuning with general data to mitigate forgetting
🔎 Similar Papers
No similar papers found.
F
Fei Zhao
NLP Team, Xiaohongshu Inc., Huangpu District, Shanghai, China
C
Chonggang Lu
NLP Team, Xiaohongshu Inc., Huangpu District, Shanghai, China
H
Haofu Qian
NLP Team, Xiaohongshu Inc., Huangpu District, Shanghai, China
F
Fangcheng Shi
NLP Team, Xiaohongshu Inc., Huangpu District, Shanghai, China
Z
Zijie Meng
NLP Team, Xiaohongshu Inc., Huangpu District, Shanghai, China
J
Jianzhao Huang
NLP Team, Xiaohongshu Inc., Huangpu District, Shanghai, China
X
Xu Tang
NLP Team, Xiaohongshu Inc., Huangpu District, Shanghai, China
Zheyong Xie
Zheyong Xie
Xiaohongshu Inc., University of Science and Technology of China
MultimodalLarge Language ModelAgent
Zheyu Ye
Zheyu Ye
Imperial College London
Language ModelsAI Agents
Z
Zhe Xu
NLP Team, Xiaohongshu Inc., Huangpu District, Shanghai, China
Yao Hu
Yao Hu
浙江大学
Machine Learning
Shaosheng Cao
Shaosheng Cao
Xiaohongshu, DiDi Chuxing, Ant Financial, Microsoft Research
LLMsMultimodal LLMsReinforcement LearningNLPGraph Neural Networks