TimeHC-RL: Temporal-aware Hierarchical Cognitive Reinforcement Learning for Enhancing LLMs' Social Intelligence

📅 2025-05-30
📈 Citations: 0
Influential: 0
📄 PDF

career value

222K/year
🤖 AI Summary
This study addresses the insufficient post-training of social intelligence in large language models (LLMs). Methodologically, it introduces a temporally aware hierarchical cognitive reinforcement learning framework—first integrating human dual-system cognition (System 1 for intuitive responses and System 2 for deliberative reasoning) with explicit temporal modeling to establish a layered, time-sensitive social reasoning paradigm. Key components include a PPO-driven hierarchical policy network, dynamic temporal attention, multi-stage cognitive state modeling, and joint dataset post-training coupled with test-time intervention. Evaluated on eight diverse social intelligence benchmarks, the 7B model achieves performance on par with DeepSeek-R1 and OpenAI-O3, significantly outperforming mainstream baselines. Results demonstrate the framework’s effectiveness and generalizability across social reasoning tasks.

Technology Category

Application Category

📝 Abstract
Recently, Large Language Models (LLMs) have made significant progress in IQ-related domains that require careful thinking, such as mathematics and coding. However, enhancing LLMs' cognitive development in social domains, particularly from a post-training perspective, remains underexplored. Recognizing that the social world follows a distinct timeline and requires a richer blend of cognitive modes (from intuitive reactions (System 1) and surface-level thinking to deliberate thinking (System 2)) than mathematics, which primarily relies on System 2 cognition (careful, step-by-step reasoning), we introduce Temporal-aware Hierarchical Cognitive Reinforcement Learning (TimeHC-RL) for enhancing LLMs' social intelligence. In our experiments, we systematically explore improving LLMs' social intelligence and validate the effectiveness of the TimeHC-RL method, through five other post-training paradigms and two test-time intervention paradigms on eight datasets with diverse data patterns. Experimental results reveal the superiority of our proposed TimeHC-RL method compared to the widely adopted System 2 RL method. It gives the 7B backbone model wings, enabling it to rival the performance of advanced models like DeepSeek-R1 and OpenAI-O3. Additionally, the systematic exploration from post-training and test-time interventions perspectives to improve LLMs' social intelligence has uncovered several valuable insights.
Problem

Research questions and friction points this paper is trying to address.

Enhancing LLMs' social intelligence post-training
Addressing diverse cognitive modes in social contexts
Improving performance in temporal-aware social interactions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Temporal-aware Hierarchical Cognitive Reinforcement Learning
Enhancing LLMs' social intelligence
Combining System 1 and System 2 cognition