Rotary Masked Autoencoders are Versatile Learners

📅 2025-05-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Irregular time-series modeling typically requires custom architectures, resulting in high complexity and poor generalization across modalities. Method: This paper proposes the Rotary Masked Autoencoder (RoMAE), the first framework to extend Rotary Position Embedding (RoPE) to multidimensional continuous positional modeling within a masked autoencoding paradigm—eliminating the need for time-series-specific architectural designs. RoMAE unifies irregular signal representation across time series, images, and audio under a single architecture. Contribution/Results: Theoretically, we identify the failure condition of RoPE’s relative position invariance when embeddings participate in input computation. Empirically, RoMAE surpasses specialized models on challenging irregular time-series benchmarks (e.g., DESC, ELAsTiCC), while matching standard MAE performance on image and audio tasks—demonstrating cross-modal generality and effective continuous positional reconstruction.

Technology Category

Application Category

📝 Abstract
Applying Transformers to irregular time-series typically requires specializations to their baseline architecture, which can result in additional computational overhead and increased method complexity. We present the Rotary Masked Autoencoder (RoMAE), which utilizes the popular Rotary Positional Embedding (RoPE) method for continuous positions. RoMAE is an extension to the Masked Autoencoder (MAE) that enables representation learning with multidimensional continuous positional information while avoiding any time-series-specific architectural specializations. We showcase RoMAE's performance on a variety of modalities including irregular and multivariate time-series, images, and audio, demonstrating that RoMAE surpasses specialized time-series architectures on difficult datasets such as the DESC ELAsTiCC Challenge while maintaining MAE's usual performance across other modalities. In addition, we investigate RoMAE's ability to reconstruct the embedded continuous positions, demonstrating that including learned embeddings in the input sequence breaks RoPE's relative position property.
Problem

Research questions and friction points this paper is trying to address.

Extends MAE for continuous positional learning without specialization
Handles irregular time-series without added complexity or overhead
Evaluates RoMAE on diverse data types including images and audio
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Rotary Positional Embedding for continuous positions
Extends Masked Autoencoder for multidimensional learning
Avoids time-series-specific architectural specializations
U
Uros Zivanovic
University of Trieste, Italy
S
Serafina Di Gioia
Abdus Salam International Centre for Theoretical Physics (ICTP), Italy
A
Andre Scaffidi
Scuola Internazionale Superiore di Studi Avanzati (SISSA), Italy
M
Mart'in de los Rios
Scuola Internazionale Superiore di Studi Avanzati (SISSA), Italy
G
Gabriella Contardo
University of Nova Gorica, Slovenia
Roberto Trotta
Roberto Trotta
Professor of Theoretical Physics and Head of Data Science, SISSA, Trieste
CosmologyBayesian methodsDark Matter/EnergyMachine LearningAI