🤖 AI Summary
Modeling spatiotemporal dependencies in high-dimensional, irregular, large-scale scientific spatiotemporal data—such as sparse sensor observations and high-fidelity simulations—remains challenging due to fragmented interpolation-reconstruction-prediction pipelines, poor scalability, and difficulty capturing complex dependencies. This paper proposes a unified Transformer-based framework addressing these issues. Its core contributions are: (1) a novel learnable query mechanism with query-level cross-scale cross-attention, enabling joint modeling of multi-granularity spatiotemporal dependencies; and (2) integration of sparse attention with conditional neural fields to support efficient, resolution-agnostic inference. The architecture follows an encoder-processor-decoder design, enabling end-to-end interpolation, reconstruction, and forecasting. Evaluated on multiple scientific modeling tasks, the method achieves state-of-the-art performance, significantly improving training/inference efficiency, generalization, and robustness—especially on large-scale datasets.
📝 Abstract
Spatiotemporal learning is challenging due to the intricate interplay between spatial and temporal dependencies, the high dimensionality of the data, and scalability constraints. These challenges are further amplified in scientific domains, where data is often irregularly distributed (e.g., missing values from sensor failures) and high-volume (e.g., high-fidelity simulations), posing additional computational and modeling difficulties. In this paper, we present SCENT, a novel framework for scalable and continuity-informed spatiotemporal representation learning. SCENT unifies interpolation, reconstruction, and forecasting within a single architecture. Built on a transformer-based encoder-processor-decoder backbone, SCENT introduces learnable queries to enhance generalization and a query-wise cross-attention mechanism to effectively capture multi-scale dependencies. To ensure scalability in both data size and model complexity, we incorporate a sparse attention mechanism, enabling flexible output representations and efficient evaluation at arbitrary resolutions. We validate SCENT through extensive simulations and real-world experiments, demonstrating state-of-the-art performance across multiple challenging tasks while achieving superior scalability.