Task-Agnostic Fusion of Time Series and Imagery for Earth Observation

📅 2025-10-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the challenge of effectively fusing time-series and single-temporal remote sensing imagery. We propose a task-agnostic multimodal unified representation framework. Methodologically, we introduce two novel components: (1) temporal discretization and cross-modal token alignment, and (2) a hybrid quantization strategy combining deterministic and learnable quantization, coupled with a masked correlation learning objective. These enable semantic alignment of images and time series within a shared embedding space, supporting cross-modal generation—e.g., counterfactual reasoning and global temperature field reconstruction—as well as downstream task transfer. Evaluated on multiple remote sensing benchmarks, our pretrained model achieves an average R² improvement of 6 percentage points (+50% relative to baseline) and a 2-percentage-point reduction in RMSE (−12% relative to baseline), demonstrating substantial gains in generalization and robustness.

Technology Category

Application Category

📝 Abstract
We propose a task-agnostic framework for multimodal fusion of time series and single timestamp images, enabling cross-modal generation and robust downstream performance. Our approach explores deterministic and learned strategies for time series quantization and then leverages a masked correlation learning objective, aligning discrete image and time series tokens in a unified representation space. Instantiated in the Earth observation domain, the pretrained model generates consistent global temperature profiles from satellite imagery and is validated through counterfactual experiments. Across downstream tasks, our task-agnostic pretraining outperforms task-specific fusion by 6% in R$^2$ and 2% in RMSE on average, and exceeds baseline methods by 50% in R$^2$ and 12% in RMSE. Finally, we analyze gradient sensitivity across modalities, providing insights into model robustness. Code, data, and weights will be released under a permissive license.
Problem

Research questions and friction points this paper is trying to address.

Fusing time series and imagery for Earth observation tasks
Aligning discrete image and time series tokens in unified space
Enabling cross-modal generation and robust downstream performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Task-agnostic multimodal fusion of time series and images
Masked correlation learning aligns discrete tokens across modalities
Pretrained model generates temperature profiles from satellite imagery
🔎 Similar Papers
No similar papers found.