From Classification to Ranking: Enhancing LLM Reasoning Capabilities for MBTI Personality Detection

📅 2026-01-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing large language model–based approaches to MBTI personality detection predominantly adopt a classification paradigm and rely on handcrafted prompts, which struggle to capture subtle distinctions among personality traits and lack autonomous reasoning capabilities. This work reframes personality detection as a ranking task and introduces a novel ranking reward function tailored for subjective assessment. By integrating supervised fine-tuning (SFT) with the Group Relative Policy Optimization (GRPO) reinforcement learning framework, the proposed method reduces dependence on expert-designed prompts and enhances the model’s ability to discriminate traits with ambiguous boundaries. Evaluated across multiple MBTI benchmarks, the approach achieves state-of-the-art performance, significantly outperforming current classification-based methods.

Technology Category

Application Category

📝 Abstract
Personality detection aims to measure an individual's corresponding personality traits through their social media posts. The advancements in Large Language Models (LLMs) offer novel perspectives for personality detection tasks. Existing approaches enhance personality trait analysis by leveraging LLMs to extract semantic information from textual posts as prompts, followed by training classifiers for categorization. However, accurately classifying personality traits remains challenging due to the inherent complexity of human personality and subtle inter-trait distinctions. Moreover, prompt-based methods often exhibit excessive dependency on expert-crafted knowledge without autonomous pattern-learning capacity. To address these limitations, we view personality detection as a ranking task rather than a classification and propose a corresponding reinforcement learning training paradigm. First, we employ supervised fine-tuning (SFT) to establish personality trait ranking capabilities while enforcing standardized output formats, creating a robust initialization. Subsequently, we introduce Group Relative Policy Optimization (GRPO) with a specialized ranking-based reward function. Unlike verification tasks with definitive solutions, personality assessment involves subjective interpretations and blurred boundaries between trait categories. Our reward function explicitly addresses this challenge by training LLMs to learn optimal answer rankings. Comprehensive experiments have demonstrated that our method achieves state-of-the-art performance across multiple personality detection benchmarks.
Problem

Research questions and friction points this paper is trying to address.

personality detection
MBTI
Large Language Models
classification
ranking
Innovation

Methods, ideas, or system contributions that make the work stand out.

ranking
reinforcement learning
Large Language Models
personality detection
Group Relative Policy Optimization
🔎 Similar Papers
No similar papers found.
Y
Yuan Cao
Institute of Computing Technology, Chinese Academy of Sciences
F
Feixiang Liu
University of Chinese Academy of Sciences
X
Xinyue Wang
Institute of Computing Technology, Chinese Academy of Sciences
Y
Yihan Zhu
Beihang University
H
Hui Xu
Institute of Computing Technology, Chinese Academy of Sciences
Zheng Wang
Zheng Wang
Associate Professor, SIMIT, Chinese Academy of Science
Optical InterconnectsPhotonic Integrated CircuitSilicon PhotonicsNanophotonics
Qiang Qiu
Qiang Qiu
Purdue University
Computer VisionPattern RecognitionMachine LearningDeep LearningImage Processing