🤖 AI Summary
Existing approaches to psychological and social behavior understanding suffer from single-task design, modality fragmentation, and poor generalization. To address these limitations, we introduce Human Behavior Atlas—the first unified multimodal behavioral understanding benchmark—comprising over 100,000 samples across text, audio, and visual modalities, annotated along four core dimensions: emotion, cognition, pathology, and social interaction. Leveraging this benchmark, we develop the OmniSapiens-7B model family, which incorporates a novel Behavior Attention Mechanism (BAM) and integrates supervised fine-tuning with reinforcement learning. Experiments demonstrate substantial gains in cross-task transfer and zero-shot generalization, with marked performance improvements on unseen behavioral datasets. Furthermore, behavior descriptors enable finer-grained semantic understanding. This work establishes a scalable, multimodal benchmark and a principled technical framework for general-purpose psychological and behavioral modeling.
📝 Abstract
Using intelligent systems to perceive psychological and social behaviors, that is, the underlying affective, cognitive, and pathological states that are manifested through observable behaviors and social interactions, remains a challenge due to their complex, multifaceted, and personalized nature. Existing work tackling these dimensions through specialized datasets and single-task systems often miss opportunities for scalability, cross-task transfer, and broader generalization. To address this gap, we curate Human Behavior Atlas, a unified benchmark of diverse behavioral tasks designed to support the development of unified models for understanding psychological and social behaviors. Human Behavior Atlas comprises over 100,000 samples spanning text, audio, and visual modalities, covering tasks on affective states, cognitive states, pathologies, and social processes. Our unification efforts can reduce redundancy and cost, enable training to scale efficiently across tasks, and enhance generalization of behavioral features across domains. On Human Behavior Atlas, we train three models: OmniSapiens-7B SFT, OmniSapiens-7B BAM, and OmniSapiens-7B RL. We show that training on Human Behavior Atlas enables models to consistently outperform existing multimodal LLMs across diverse behavioral tasks. Pretraining on Human Behavior Atlas also improves transfer to novel behavioral datasets; with the targeted use of behavioral descriptors yielding meaningful performance gains.