FujiView: Multimodal Late-Fusion for Predicting Scenic Visibility

📅 2026-02-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of accurately predicting natural scenic visibility, which is highly sensitive to dynamic atmospheric conditions and significantly impacts tourism experiences. To this end, the authors introduce a novel task termed Scenic Visibility Forecasting (SVF) and present FujiView, a late-fusion framework that leverages multimodal data from the first large-scale dataset for Mount Fuji visibility. The approach combines visual features extracted from webcam images using YOLO with numerical weather prediction data to forecast visibility across five discrete categories. Experimental results demonstrate that FujiView achieves prediction accuracies of 89% for same-day and 84% for next-day forecasts, confirming the efficacy of the late-fusion strategy in multimodal environmental perception. Furthermore, the analysis reveals the shifting dominance of visual versus meteorological features across different forecasting horizons.

Technology Category

Application Category

📝 Abstract
Visibility of natural landmarks such as Mount Fuji is a defining factor in both tourism planning and visitor experience, yet it remains difficult to predict due to rapidly changing atmospheric conditions. We present FujiView, a multimodal learning framework and dataset for predicting scenic visibility by fusing webcam imagery with structured meteorological data. Our late-fusion approach combines image-derived class probabilities with numerical weather features to classify visibility into five categories. The dataset currently comprises over 100,000 webcam images paired with concurrent and forecasted weather conditions from more than 40 cameras around Mount Fuji, and continues to expand; it will be released to support further research in environmental forecasting. Experiments show that YOLO-based vision features dominate short-term horizons such as "nowcasting" and "samedaycasting", while weather-driven forecasts increasingly take over as the primary predictive signal beyond $+1$d. Late fusion consistently yields the highest overall accuracy, achieving ACC of approx 0.89 for same-day prediction and up to 84% for next-day forecasts. These results position Scenic Visibility Forecasting (SVF) as a new benchmark task for multimodal learning.
Problem

Research questions and friction points this paper is trying to address.

Scenic Visibility
Visibility Prediction
Multimodal Learning
Environmental Forecasting
Mount Fuji
Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal late-fusion
scenic visibility forecasting
YOLO-based vision features
environmental forecasting
webcam-meteorological dataset