Weakly Supervised Concept Learning for Object-centric Visual Reasoning

📅 2026-05-05
📈 Citations: 0
Influential: 0
📄 PDF

career value

208K/year
🤖 AI Summary
This work addresses the high annotation cost of existing object-centric visual reasoning methods, which rely heavily on extensive labeled data to map perceptual inputs to symbolic representations. To mitigate this limitation, the authors propose a weakly supervised neuro-symbolic framework that integrates slot attention with a variational autoencoder to guide the latent space toward learning interpretable concepts using only 1% of the original annotations. The perceptual outputs are then translated into symbolic knowledge via inductive logic programming for downstream logical reasoning. Evaluated on both synthetic and real-world datasets, the proposed approach substantially reduces labeling requirements while outperforming current foundation models in discovering complex abstract rules, demonstrating robustness under domain shift, and achieving superior domain generalization performance.
📝 Abstract
Neurosymbolic systems promise to combine deep neural network's (DNN) processing of raw sensor inputs with few-shot performance of symbolic artificial intelligence. Two-stage approaches explicitly decouple DNN based perception from subsequent rule based reasoning. This avoids optimization and interpretability issues of end to end differentiable approaches, but requires costly labels for the perception output. This paper introduces an efficient weak supervision scheme for the perception stage to ground its output symbols for logical induction in object-centric reasoning tasks. It combines a slot-based architecture for object-centricity with a Variational Autoencoder (VAE) for self-supervision, competing with concept guidance on latent dimensions for human interpretable grounding. The resulting predictions are translated into symbolic background knowledge for reasoning frameworks, such as Inductive Logic Programming (ILP), Decision Trees, and Bayesian Networks. Our extensive empirical evaluation on synthetic and real world datasets shows that our approach can discover complex, abstract rules for object centric reasoning whilst reducing supervision to as little as 1% of labels, and being robust even under substantial domain shift. Notably, at 1% supervision it even outperforms state of the art foundation model baselines in domain generalization
Problem

Research questions and friction points this paper is trying to address.

Weakly Supervised Learning
Object-centric Reasoning
Neurosymbolic Systems
Symbol Grounding
Visual Concept Learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

weak supervision
object-centric representation
neurosymbolic reasoning
variational autoencoder
symbol grounding