π€ AI Summary
This study addresses the lack of privacy-preserving and consistent methods for characterizing interpersonal interactions linked to social capital in the built environment, which hinders the evaluation of design interventions. To this end, the authors propose an end-to-end embedded kinesic recognition framework that systematically annotates Ekman and Friesenβs five kinesic functions for the first time on a multimodal, multi-scenario dyadic interaction dataset (DUET, comprising 12 interaction sessions) and directly infers communicative functions from privacy-friendly skeletal motion data. The approach eliminates the need for handcrafted action-to-function mappings and leverages transfer learning and deep modeling to achieve generalization across subjects and environments. Experiments demonstrate that existing single-person action recognition models are ill-suited for social function identification, whereas the proposed framework significantly outperforms baseline methods in both functional clustering and classification tasks.
π Abstract
Social infrastructure and other built environments are increasingly expected to support well-being and community resilience by enabling social interaction. Yet in civil and built-environment research, there is no consistent and privacy-preserving way to represent and measure socially meaningful interaction in these spaces, leaving studies to operationalize"interaction"differently across contexts and limiting practitioners'ability to evaluate whether design interventions are changing the forms of interaction that social capital theory predicts should matter. To address this field-level and methodological gap, we introduce the Dyadic User Engagement DataseT (DUET) dataset and an embedded kinesics recognition framework that operationalize Ekman and Friesen's kinesics taxonomy as a function-level interaction vocabulary aligned with social capital-relevant behaviors (e.g., reciprocity and attention coordination). DUET captures 12 dyadic interactions spanning all five kinesic functions-emblems, illustrators, affect displays, adaptors, and regulators-across four sensing modalities and three built-environment contexts, enabling privacy-preserving analysis of communicative intent through movement. Benchmarking six open-source, state-of-the-art human activity recognition models quantifies the difficulty of communicative-function recognition on DUET and highlights the limitations of ubiquitous monadic, action-level recognition when extended to dyadic, socially grounded interaction measurement. Building on DUET, our recognition framework infers communicative function directly from privacy-preserving skeletal motion without handcrafted action-to-function dictionaries; using a transfer-learning architecture, it reveals structured clustering of kinesic functions and a strong association between representation quality and classification performance while generalizing across subjects and contexts.