FORLA:Federated Object-centric Representation Learning with Slot Attention

📅 2025-06-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of learning generalizable representations from heterogeneous, unlabeled visual data in federated learning (FL), this paper proposes an object-centric disentangled unsupervised federated representation learning framework. Methodologically, it introduces slot attention into FL for the first time, employing a dual-branch teacher-student architecture that jointly optimizes a shared feature adapter and a slot attention module to achieve object-level feature disentanglement and unsupervised cross-client alignment. Key contributions include: (1) establishing a distributed object-centric representation paradigm; (2) enabling lightweight adaptation of foundation models and cross-domain feature distillation; and (3) achieving semantically consistent representation alignment without any labels. Extensive experiments on multiple real-world federated datasets demonstrate that our method significantly outperforms centralized baselines and exhibits superior generalization, compactness, and universality on downstream tasks such as object discovery.

Technology Category

Application Category

📝 Abstract
Learning efficient visual representations across heterogeneous unlabeled datasets remains a central challenge in federated learning. Effective federated representations require features that are jointly informative across clients while disentangling domain-specific factors without supervision. We introduce FORLA, a novel framework for federated object-centric representation learning and feature adaptation across clients using unsupervised slot attention. At the core of our method is a shared feature adapter, trained collaboratively across clients to adapt features from foundation models, and a shared slot attention module that learns to reconstruct the adapted features. To optimize this adapter, we design a two-branch student-teacher architecture. In each client, a student decoder learns to reconstruct full features from foundation models, while a teacher decoder reconstructs their adapted, low-dimensional counterpart. The shared slot attention module bridges cross-domain learning by aligning object-level representations across clients. Experiments in multiple real-world datasets show that our framework not only outperforms centralized baselines on object discovery but also learns a compact, universal representation that generalizes well across domains. This work highlights federated slot attention as an effective tool for scalable, unsupervised visual representation learning from cross-domain data with distributed concepts.
Problem

Research questions and friction points this paper is trying to address.

Learning visual representations across heterogeneous unlabeled datasets in federated learning
Disentangling domain-specific factors without supervision in federated representations
Generalizing object-centric representations across domains with distributed concepts
Innovation

Methods, ideas, or system contributions that make the work stand out.

Federated object-centric learning with slot attention
Shared feature adapter and slot attention module
Two-branch student-teacher architecture for optimization
🔎 Similar Papers
No similar papers found.
Guiqiu Liao
Guiqiu Liao
University of Pennsylvania
Surgical roboticsComputer visionMachine learning
M
M. Jogan
PCASO Laboratory, Department of Surgery, University of Pennsylvania
Eric Eaton
Eric Eaton
University of Pennsylvania
artificial intelligencemachine learningcontinual learningroboticsmedicine
D
Daniel A. Hashimoto
PCASO Laboratory, Department of Surgery, University of Pennsylvania & Department of Computer and Information Science, University of Pennsylvania