Unsupervised Learning of Inter-Object Relationships via Group Homomorphism

📅 2026-04-22

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

This work addresses the limitation of current deep learning models, which rely on statistical correlations in large-scale data and struggle to autonomously infer environmental structure from limited experience as human infants do. To bridge this gap, the paper introduces group homomorphism as a structural inductive bias into unsupervised representation learning for the first time. The proposed method leverages algebraic homomorphic constraints to jointly achieve multi-object segmentation and motion law extraction from dynamic image sequences without any labels. By decomposing pixel-level changes into interpretable transformation components, it successfully maps relative object motions—such as approaching or receding—onto a one-dimensional additive latent space. This yields physically disentangled and interpretable representations that emulate the internalization mechanism infants use to learn regularities in their environment.

Technology Category

Application Category

📝 Abstract

While current deep learning models achieve high performance by learning statistical correlations from vast datasets,which stands in stark contrast to human learning. They lack the flexibility of humans-particularly preverbal infants-to autonomously acquire the underlying structure of the world from limited experience and adapt to novel situations. In this study, we propose an unsupervised representation learning method based on a hierarchical relationship in group operations, rather than statistical independence, aiming to build a computational model of the cognitive development of infants. The proposed model features an integrated architecture that simultaneously performs object segmentation and the extraction of motion laws from dynamic image sequences. By introducing the Homomorphism from algebra as a structural constraint within a neural network, the model structurally separates pixel-level changes into meaningful, decomposed transformation components, such as translation and deformation. Using interaction scenes (chasing and evading tasks) based on developmental science findings, we experimentally demonstrate that the model can segment multiple objects into individual slots without any ground-truth labels. Furthermore, we confirmed that relative movements between objects, such as approaching or receding, are accurately mapped and structured into a one-dimensional additive latent space. These results suggest that by introducing algebraic geometric constraints rather than relying solely on statistical correlation learning, physically interpretable "disentangled representations" can be acquired. This study contributes to the understanding of the process by which infants internalize environmental laws as structures and provides a new perspective for constructing artificial systems with developmental intelligence.

Problem

Research questions and friction points this paper is trying to address.

unsupervised learning

inter-object relationships

cognitive development

disentangled representations

developmental intelligence

Innovation

Methods, ideas, or system contributions that make the work stand out.

group homomorphism

unsupervised learning

disentangled representation