HiVAE: Hierarchical Latent Variables for Scalable Theory of Mind

📅 2026-02-18

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

Existing theory of mind (ToM) approaches struggle to scale to complex real-world scenarios and are often confined to small grid-world environments. This work proposes HiVAE, a hierarchical variational autoencoder inspired by the human belief-desire-intention cognitive architecture, which introduces hierarchical latent variables into scalable ToM modeling for the first time. By employing a self-supervised alignment strategy to enhance the semantic interpretability of latent representations, HiVAE significantly outperforms baseline models on a large-scale campus navigation task comprising 3,185 nodes. The model not only improves inference of agents’ implicit goals and mental states but also reveals critical challenges and new research directions concerning the explicit alignment between latent space structures and genuine psychological states.

Technology Category

Application Category

📝 Abstract

Theory of mind (ToM) enables AI systems to infer agents' hidden goals and mental states, but existing approaches focus mainly on small human understandable gridworld spaces. We introduce HiVAE, a hierarchical variational architecture that scales ToM reasoning to realistic spatiotemporal domains. Inspired by the belief-desire-intention structure of human cognition, our three-level VAE hierarchy achieves substantial performance improvements on a 3,185-node campus navigation task. However, we identify a critical limitation: while our hierarchical structure improves prediction, learned latent representations lack explicit grounding to actual mental states. We propose self-supervised alignment strategies and present this work to solicit community feedback on grounding approaches.

Problem

Research questions and friction points this paper is trying to address.

Theory of Mind

scalability

latent representation

mental states

grounding

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical VAE

Theory of Mind

Latent Representation

Self-supervised Alignment

Scalable Reasoning

🔎 Similar Papers

Entering Real Social World! Benchmarking the Social Intelligence of Large Language Models from a First-person Perspective

2024-10-08Citations: 0

Through the Theory of Mind's Eye: Reading Minds with Multimodal Video Large Language Models

2024-06-19arXiv.orgCitations: 5