Think on your Feet: Adaptive Thinking via Reinforcement Learning for Social Agents

📅 2025-05-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing social intelligence simulation methods struggle to dynamically adjust reasoning depth: they either lack deep reasoning capabilities or enforce uniform, lengthy reasoning chains—resulting in excessive token consumption and rigid agent behavior. To address this, we propose an adaptive reasoning framework tailored for social intelligence simulation. Our approach introduces the first multi-granular modeling of four distinct cognitive modes—from intuitive reaction to deliberate reflection—and designs a context-aware, real-time mode-switching mechanism. Furthermore, we establish a token-efficient reasoning paradigm grounded in reinforcement learning and Adaptive Mode Policy Optimization (AMPO). Evaluated on social intelligence tasks, our method reduces reasoning chain length by 32.8% and improves accuracy by 7.0% over the state-of-the-art GRPO, yielding a 15.6% overall performance gain. Crucially, it significantly enhances human-like adaptability in dynamic reasoning.

Technology Category

Application Category

📝 Abstract
Effective social intelligence simulation requires language agents to dynamically adjust reasoning depth, a capability notably absent in current approaches. While existing methods either lack this kind of reasoning capability or enforce uniform long chain-of-thought reasoning across all scenarios, resulting in excessive token usage and inappropriate social simulation. In this paper, we propose $ extbf{A}$daptive $ extbf{M}$ode $ extbf{L}$earning ($ extbf{AML}$) that strategically selects from four thinking modes (intuitive reaction $ ightarrow$ deep contemplation) based on real-time context. Our framework's core innovation, the $ extbf{A}$daptive $ extbf{M}$ode $ extbf{P}$olicy $ extbf{O}$ptimization ($ extbf{AMPO}$) algorithm, introduces three key advancements over existing methods: (1) Multi-granular thinking mode design, (2) Context-aware mode switching across social interaction, and (3) Token-efficient reasoning via depth-adaptive processing. Extensive experiments on social intelligence tasks confirm that AML achieves 15.6% higher task performance than state-of-the-art methods. Notably, our method outperforms GRPO by 7.0% with 32.8% shorter reasoning chains. These results demonstrate that context-sensitive thinking mode selection, as implemented in AMPO, enables more human-like adaptive reasoning than GRPO's fixed-depth approach
Problem

Research questions and friction points this paper is trying to address.

Dynamic reasoning depth adjustment for social agents
Context-aware adaptive thinking mode selection
Token-efficient reasoning in social intelligence tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive Mode Learning (AML) for dynamic reasoning
Context-aware mode switching in social interactions
Token-efficient reasoning via depth-adaptive processing
🔎 Similar Papers
No similar papers found.