Grouped Discrete Representation for Object-Centric Learning

๐Ÿ“… 2024-11-04
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 1
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing VAE-based object-centric learning (OCL) approaches treat features as atomic units and employ scalar discrete codes, leading to inadequate attribute-level similarity modeling and limited representational interpretability. To address this, we propose Grouped Discrete Representation (GDR): a novel discrete encoding paradigm that semantically disentangles latent features into attribute-specific groups and replaces scalar codes with tuple-based indicesโ€”enabling structurally interpretable and attribute-composable representations. GDR integrates channel-wise grouped decomposition with tuple-indexed discretization, and is architecture-agnostic, supporting VAEs, Transformers, and diffusion models. On multiple benchmarks, GDR significantly improves unsupervised object discovery performance. Visualizations confirm its enhanced object separation and reconstruction fidelity. This work establishes the first attribute-decoupled discretization framework for OCL, advancing both interpretability and compositional reasoning in unsupervised representation learning.

Technology Category

Application Category

๐Ÿ“ Abstract
Object-Centric Learning (OCL) can discover objects in images or videos by simply reconstructing the input. For better object discovery, representative OCL methods reconstruct the input as its Variational Autoencoder (VAE) intermediate representation, which suppresses pixel noises and promotes object separability by discretizing continuous super-pixels with template features. However, treating features as units overlooks their composing attributes, thus impeding model generalization; indexing features with scalar numbers loses attribute-level similarities and differences, thus hindering model convergence. We propose extit{Grouped Discrete Representation} (GDR) for OCL. We decompose features into combinatorial attributes via organized channel grouping, and compose these attributes into discrete representation via tuple indexes. Experiments show that our GDR improves both Transformer- and Diffusion-based OCL methods consistently on various datasets. Visualizations show that our GDR captures better object separability.
Problem

Research questions and friction points this paper is trying to address.

Enhance object separability in Object-Centric Learning
Address loss of attribute-level similarities in discrete representations
Improve generalization and convergence in feature decomposition
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decompose features into combinatorial attributes
Quantize features via tuple code indexes
Organize channel grouping for better generalization
๐Ÿ”Ž Similar Papers
No similar papers found.