TimeHC-RL: Temporal-aware Hierarchical Cognitive Reinforcement Learning for Enhancing LLMs' Social Intelligence

📅 2025-05-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the insufficient post-training of social intelligence in large language models (LLMs). Methodologically, it introduces a temporally aware hierarchical cognitive reinforcement learning framework—first integrating human dual-system cognition (System 1 for intuitive responses and System 2 for deliberative reasoning) with explicit temporal modeling to establish a layered, time-sensitive social reasoning paradigm. Key components include a PPO-driven hierarchical policy network, dynamic temporal attention, multi-stage cognitive state modeling, and joint dataset post-training coupled with test-time intervention. Evaluated on eight diverse social intelligence benchmarks, the 7B model achieves performance on par with DeepSeek-R1 and OpenAI-O3, significantly outperforming mainstream baselines. Results demonstrate the framework’s effectiveness and generalizability across social reasoning tasks.

Technology Category

Application Category

📝 Abstract
Recently, Large Language Models (LLMs) have made significant progress in IQ-related domains that require careful thinking, such as mathematics and coding. However, enhancing LLMs' cognitive development in social domains, particularly from a post-training perspective, remains underexplored. Recognizing that the social world follows a distinct timeline and requires a richer blend of cognitive modes (from intuitive reactions (System 1) and surface-level thinking to deliberate thinking (System 2)) than mathematics, which primarily relies on System 2 cognition (careful, step-by-step reasoning), we introduce Temporal-aware Hierarchical Cognitive Reinforcement Learning (TimeHC-RL) for enhancing LLMs' social intelligence. In our experiments, we systematically explore improving LLMs' social intelligence and validate the effectiveness of the TimeHC-RL method, through five other post-training paradigms and two test-time intervention paradigms on eight datasets with diverse data patterns. Experimental results reveal the superiority of our proposed TimeHC-RL method compared to the widely adopted System 2 RL method. It gives the 7B backbone model wings, enabling it to rival the performance of advanced models like DeepSeek-R1 and OpenAI-O3. Additionally, the systematic exploration from post-training and test-time interventions perspectives to improve LLMs' social intelligence has uncovered several valuable insights.
Problem

Research questions and friction points this paper is trying to address.

Enhancing LLMs' social intelligence post-training
Addressing diverse cognitive modes in social contexts
Improving performance in temporal-aware social interactions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Temporal-aware Hierarchical Cognitive Reinforcement Learning
Enhancing LLMs' social intelligence
Combining System 1 and System 2 cognition
🔎 Similar Papers
No similar papers found.
G
Guiyang Hou
Zhejiang University
X
Xing Gao
Tongyi Lab, Alibaba Group
Yuchuan Wu
Yuchuan Wu
Alibaba Tongyi Lab(通义实验室)
Conversational AILarge Language ModelsSocial Intelligence
Xiang Huang
Xiang Huang
Nanjing university, Tongyi Lab
KBQAInstruction followingAlignmentRL
Wenqi Zhang
Wenqi Zhang
Zhejiang University
Language ModelMultimodal LearningEmbodied Agents
Z
Zhe Zheng
Zhejiang University
Y
Yongliang Shen
Zhejiang University
J
Jialu Du
Zhejiang University
F
Fei Huang
Tongyi Lab, Alibaba Group
Y
Yongbin Li
Tongyi Lab, Alibaba Group
Weiming Lu
Weiming Lu
Zhejiang University
Natural Language ProcessingLarge Language ModelsAGI