🤖 AI Summary
Traditional supervised fine-tuning relies solely on instance-level labels, which fails to capture relative preference relationships among samples, thereby limiting decision boundary refinement and confidence calibration. To address this limitation, this work proposes ClaHF, a novel framework that, for the first time, integrates human-feedback-inspired reinforcement learning into text classification without requiring additional human annotations. ClaHF transforms hard labels into preference signals by constructing candidate predictions and their ranking relationships. It employs a reward model to learn the relative ordering between the top-1 prediction and other candidates, and jointly optimizes the policy to distinguish optimal from suboptimal predictions. Experimental results demonstrate that ClaHF significantly improves both classification accuracy and confidence calibration across eight benchmark tasks and is compatible with diverse language models.
📝 Abstract
Text classification models are typically trained via supervised fine-tuning (SFT). However, SFT essentially performs behavior cloning from instance-wise labels and thus fails to adequately capture relative preference relations among samples, which limits the model's ability to shape decision boundaries and calibrate predictive confidence. In this paper, we propose ClaHF, a human feedback-inspired reinforcement learning (RL) framework for text classification that integrates preference modeling and RL optimization into the classification pipeline without requiring additional human annotations. Unlike prior work that relies solely on instance-wise supervision, ClaHF constructs multiple candidate predictions together with their relative ranking relations, and jointly models the Top-1 preference and the ordering among non-optimal candidates within a reward model (RM). This design converts conventional label supervision into preference signals that are directly applicable to policy optimization. We conduct systematic evaluations on eight classification tasks spanning three categories of scenarios. Results demonstrate that ClaHF consistently improves both classification performance and confidence calibration across diverse language models (LMs). The data and code are available at https://anonymous.4open.science/r/ClaHF.