🤖 AI Summary
This study investigates the cross-domain generalization of reinforcement post-training (RPT) on large language models’ reasoning capabilities, particularly in domains unseen during training.
Method: We conduct systematic evaluations across multiple domains and reasoning paradigms using both observational and interventional experiments—specifically, multi-domain comparative analysis, single-domain RPT fine-tuning, and cross-domain transfer testing—validated on several open-source model weights.
Results: We identify significant domain heterogeneity in RPT gains: performance improves markedly in domains sharing structural or reasoning-pattern similarities with the RPT source domain, but gains vanish—or even become negative—in domains requiring fundamentally different reasoning mechanisms. This reveals a critical limitation of current RPT methods in transferring across distinct reasoning paradigms. Our findings provide empirical evidence that RPT-based alignment lacks robust cross-paradigm generalization, offering key insights and direction for developing more universally applicable alignment techniques.
📝 Abstract
Reinforcement post training (RPT) has recently shown promise in improving the reasoning abilities of large language models (LLMs). However, it remains unclear how well these improvements generalize to new domains, as prior work evaluates RPT models on data from the same domains used for fine-tuning. To understand the generalizability of RPT, we conduct two studies. (1) Observational: We compare a wide range of open-weight RPT models against their corresponding base models across multiple domains, including both seen and unseen domains in their fine-tuning data. (2) Interventional: we fine-tune LLMs with RPT on single domains and evaluate their performance across multiple domains. Both studies converge on the same conclusion that, although RPT brings substantial gains on tasks similar to the fine-tuning data, the gains generalize inconsistently and can vanish on domains with different reasoning patterns.