Slot Attention with Re-Initialization and Self-Distillation

📅 2025-07-31

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

Existing Slot Attention methods suffer from two key limitations: (1) slot redundancy—caused by repeated random initialization, leading to over-segmentation of objects—and (2) weak supervision—relying solely on input reconstruction without explicit guidance for internal slot representations. To address these, we propose a dynamic slot reinitialization and cross-iteration self-distillation framework. Specifically, competitive cross-attention identifies redundant slots and resets their features to mitigate over-segmentation; meanwhile, more stable slot representations from prior iterations serve as teacher signals for label-free internal supervision. Our method integrates seamlessly into the standard Slot Attention architecture without introducing additional parameters. Evaluated on CLEVR and Multi-dSprites benchmarks, it achieves state-of-the-art performance across object discovery, segmentation, reconstruction, and visual reasoning tasks—improving segmentation mIoU by 3.2% and reconstruction PSNR by 1.8 dB.

Technology Category

Application Category

📝 Abstract

Unlike popular solutions based on dense feature maps, Object-Centric Learning (OCL) represents visual scenes as sub-symbolic object-level feature vectors, termed slots, which are highly versatile for tasks involving visual modalities. OCL typically aggregates object superpixels into slots by iteratively applying competitive cross attention, known as Slot Attention, with the slots as the query. However, once initialized, these slots are reused naively, causing redundant slots to compete with informative ones for representing objects. This often results in objects being erroneously segmented into parts. Additionally, mainstream methods derive supervision signals solely from decoding slots into the input's reconstruction, overlooking potential supervision based on internal information. To address these issues, we propose Slot Attention with re-Initialization and self-Distillation (DIAS): $emph{i)}$ We reduce redundancy in the aggregated slots and re-initialize extra aggregation to update the remaining slots; $emph{ii)}$ We drive the bad attention map at the first aggregation iteration to approximate the good at the last iteration to enable self-distillation. Experiments demonstrate that DIAS achieves state-of-the-art on OCL tasks like object discovery and recognition, while also improving advanced visual prediction and reasoning. Our code is available on https://github.com/Genera1Z/DIAS.

Problem

Research questions and friction points this paper is trying to address.

Reduces slot redundancy in Object-Centric Learning

Improves object segmentation accuracy in Slot Attention

Enhances supervision via self-distillation and re-initialization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Re-initialization reduces slot redundancy

Self-distillation improves attention maps

Combines re-initialization and self-distillation

🔎 Similar Papers

MIDAS: Multi-level Intent, Domain, And Slot Knowledge Distillation for Multi-turn NLU