A Multimodal Approach to Heritage Preservation in the Context of Climate Change

📅 2025-10-15

📈 Citations: 0

✨ Influential: 0

career value

162K/year

🤖 AI Summary

Climate change accelerates the degradation of cultural heritage sites, yet conventional single-modal monitoring fails to capture the complex interplay between environmental stressors and material deterioration. To address this, we propose a lightweight multimodal fusion architecture that jointly leverages environmental sensor data (temperature/humidity) and visual imagery for accurate degradation assessment under few-shot conditions. Our approach innovatively simplifies PerceiverIO to a 64-dimensional latent space to mitigate overfitting and introduces an adaptive Barlow Twins loss function that explicitly models cross-modal complementarity while suppressing redundancy. Systematic hyperparameter search further optimizes cross-modal alignment strength. Evaluated on the Strasbourg Cathedral dataset, our method achieves 76.9% accuracy—outperforming state-of-the-art multimodal baselines by 43%, and surpassing unimodal sensor- and image-based models by 15.4% and 30.7%, respectively—demonstrating its efficacy and generalizability for intelligent heritage conservation.

Technology Category

Application Category

📝 Abstract

Cultural heritage sites face accelerating degradation due to climate change, yet tradi- tional monitoring relies on unimodal analysis (visual inspection or environmental sen- sors alone) that fails to capture the complex interplay between environmental stres- sors and material deterioration. We propose a lightweight multimodal architecture that fuses sensor data (temperature, humidity) with visual imagery to predict degradation severity at heritage sites. Our approach adapts PerceiverIO with two key innovations: (1) simplified encoders (64D latent space) that prevent overfitting on small datasets (n=37 training samples), and (2) Adaptive Barlow Twins loss that encourages modality complementarity rather than redundancy. On data from Strasbourg Cathedral, our model achieves 76.9% accu- racy, a 43% improvement over standard multimodal architectures (VisualBERT, Trans- former) and 25% over vanilla PerceiverIO. Ablation studies reveal that sensor-only achieves 61.5% while image-only reaches 46.2%, confirming successful multimodal synergy. A systematic hyperparameter study identifies an optimal moderate correlation target (τ =0.3) that balances align- ment and complementarity, achieving 69.2% accuracy compared to other τ values (τ =0.1/0.5/0.7: 53.8%, τ =0.9: 61.5%). This work demonstrates that architectural sim- plicity combined with contrastive regularization enables effective multimodal learning in data-scarce heritage monitoring contexts, providing a foundation for AI-driven con- servation decision support systems.

Problem

Research questions and friction points this paper is trying to address.

Predicting degradation severity at cultural heritage sites using multimodal data

Overcoming limitations of unimodal monitoring methods for climate change impacts

Enabling effective multimodal learning with small datasets in heritage preservation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fuses sensor data with visual imagery

Simplifies encoders to prevent overfitting

Uses Adaptive Barlow Twins loss

🔎 Similar Papers

No similar papers found.