🤖 AI Summary
This study investigates the dynamic coupling between language modeling and embodied perspective-taking in human collaboration. Building upon Selman’s developmental theory, we propose PerspAct—a novel framework that integrates embodied perspective-taking with implicit linguistic generation, leveraging the ReAct paradigm to drive GPT-series LLMs in simulating developmental stages of perspective-taking and their associated internal narratives. We extend the Director Task paradigm to conduct qualitative action analysis and quantitative performance evaluation. Results demonstrate that models consistently generate stage-appropriate internal narratives and exhibit progressive advancement toward higher-order stages during interaction; higher stages significantly enhance collaborative efficiency, whereas lower stages show instability in complex scenarios. Our core contribution lies in uncovering how LLMs achieve dynamic cognitive representation upgrading through embodied linguistic interaction, and in establishing the critical regulatory role of internal narrative in collaborative reasoning.
📝 Abstract
Language and embodied perspective taking are essential for human collaboration, yet few computational models address both simultaneously. This work investigates the PerspAct system [1], which integrates the ReAct (Reason and Act) paradigm with Large Language Models (LLMs) to simulate developmental stages of perspective taking, grounded in Selman's theory [2]. Using an extended director task, we evaluate GPT's ability to generate internal narratives aligned with specified developmental stages, and assess how these influence collaborative performance both qualitatively (action selection) and quantitatively (task efficiency). Results show that GPT reliably produces developmentally-consistent narratives before task execution but often shifts towards more advanced stages during interaction, suggesting that language exchanges help refine internal representations. Higher developmental stages generally enhance collaborative effectiveness, while earlier stages yield more variable outcomes in complex contexts. These findings highlight the potential of integrating embodied perspective taking and language in LLMs to better model developmental dynamics and stress the importance of evaluating internal speech during combined linguistic and embodied tasks.