🤖 AI Summary
Accurately predicting individual Big Five personality traits, facets, and item-level scores from large-scale generated text remains challenging due to LLM input-length constraints and semantic noise. Method: We propose a semantics-guided text preselection framework that filters raw text and extracts contextually relevant segments based on the semantic characteristics of each personality dimension, thereby enhancing alignment between input text and target personality constructs. The approach integrates a deep learning prediction model with fine-grained semantic similarity computation to enable end-to-end, goal-directed text selection. Contribution/Results: Evaluated on a stream-of-consciousness essay dataset, our method reduces mean absolute error by 12.7% and significantly improves prediction accuracy across all five traits, demonstrating both the effectiveness and generalizability of semantics-driven preselection in computational personality assessment.
📝 Abstract
Predicting an individual's personalities from their generated texts is a challenging task, especially when the text volume is large. In this paper, we introduce a straightforward yet effective novel strategy called targeted preselection of texts (TPoT). This method semantically filters the texts as input to a deep learning model, specifically designed to predict a Big Five personality trait, facet, or item, referred to as the BIG5-TPoT model. By selecting texts that are semantically relevant to a particular trait, facet, or item, this strategy not only addresses the issue of input text limits in large language models but also improves the Mean Absolute Error and accuracy metrics in predictions for the Stream of Consciousness Essays dataset.