SlotFM: A Motion Foundation Model with Slot Attention for Diverse Downstream Tasks

📅 2025-09-25

📈 Citations: 0

✨ Influential: 0

career value

223K/year

🤖 AI Summary

Existing motion foundation models primarily target everyday activity classification and struggle to generalize to diverse wearable-signal tasks that rely critically on time-frequency characteristics. To address this, we propose SlotFM—a motion foundation model based on slot attention—that jointly models temporal local structure and spectral patterns via a novel time-frequency slot attention mechanism. This mechanism generates multiple embedding slots, each specialized for distinct signal components. Additionally, we introduce a dual-regularized reconstruction loss to enhance fine-grained signal reconstruction fidelity and representation generality. Evaluated on 16 downstream classification and regression tasks, SlotFM significantly outperforms state-of-the-art self-supervised methods on 13 tasks (average improvement of 4.5%) and matches performance on the remaining three, demonstrating superior generalizability and multi-task adaptability.

Technology Category

Application Category

📝 Abstract

Wearable accelerometers are used for a wide range of applications, such as gesture recognition, gait analysis, and sports monitoring. Yet most existing foundation models focus primarily on classifying common daily activities such as locomotion and exercise, limiting their applicability to the broader range of tasks that rely on other signal characteristics. We present SlotFM, an accelerometer foundation model that generalizes across diverse downstream tasks. SlotFM uses Time-Frequency Slot Attention, an extension of Slot Attention that processes both time and frequency representations of the raw signals. It generates multiple small embeddings (slots), each capturing different signal components, enabling task-specific heads to focus on the most relevant parts of the data. We also introduce two loss regularizers that capture local structure and frequency patterns, which improve reconstruction of fine-grained details and helps the embeddings preserve task-relevant information. We evaluate SlotFM on 16 classification and regression downstream tasks that extend beyond standard human activity recognition. It outperforms existing self-supervised approaches on 13 of these tasks and achieves comparable results to the best performing approaches on the remaining tasks. On average, our method yields a 4.5% performance gain, demonstrating strong generalization for sensing foundation models.

Problem

Research questions and friction points this paper is trying to address.

Generalizing accelerometer models for diverse downstream tasks

Capturing multiple signal components through slot attention

Improving fine-grained reconstruction with specialized loss regularizers

Innovation

Methods, ideas, or system contributions that make the work stand out.

Slot Attention processes time-frequency signal representations

Generates multiple embeddings capturing different signal components

Uses loss regularizers for local structure and frequency patterns

🔎 Similar Papers

Chrono: A Simple Blueprint for Representing Time in MLLMs