🤖 AI Summary
This work addresses critical limitations in existing all-sky cloud image datasets—such as short temporal coverage, daytime bias, and lack of precise astrometric calibration—by presenting the first predominantly nighttime, eight-year (2018–2025) all-sky cloud dataset comprising 429,620 images. Pixel-level altitude-azimuth (Alt-Az) calibration is achieved through stellar astrometry, accompanied by star-aware cloud and background masks. Cloud segmentation using a linear probe on DINOv2 local features attains 93.3% ± 1.1% accuracy on a manually annotated subset, while Alt-Az calibration achieves precisions of 0.37° at zenith and 1.34° at 30° elevation. The study further establishes short-term nowcasting baselines—including Persistence, optical flow, ConvLSTM, and VideoGPT—with ConvLSTM showing marginal superiority, thereby highlighting the inherent challenges in cloud evolution prediction and providing essential data and methodologies for autonomous observatory scheduling.
📝 Abstract
Ground-based time-domain observatories require minute-by-minute, site-scale awareness of cloud cover, yet existing all-sky datasets are short, daylight-biased, or lack astrometric calibration. We present LenghuSky-8, an eight-year (2018-2025) all-sky imaging dataset from a premier astronomical site, comprising 429,620 $512 \times 512$ frames with 81.2% night-time coverage, star-aware cloud masks, background masks, and per-pixel altitude-azimuth (Alt-Az) calibration. For robust cloud segmentation across day, night, and lunar phases, we train a linear probe on DINOv3 local features and obtain 93.3% $\pm$ 1.1% overall accuracy on a balanced, manually labeled set of 1,111 images. Using stellar astrometry, we map each pixel to local alt-az coordinates and measure calibration uncertainties of approximately 0.37 deg at zenith and approximately 1.34 deg at 30 deg altitude, sufficient for integration with telescope schedulers. Beyond segmentation, we introduce a short-horizon nowcasting benchmark over per-pixel three-class logits (sky/cloud/contamination) with four baselines: persistence (copying the last frame), optical flow, ConvLSTM, and VideoGPT. ConvLSTM performs best but yields only limited gains over persistence, underscoring the difficulty of near-term cloud evolution. We release the dataset, calibrations, and an open-source toolkit for loading, evaluation, and scheduler-ready alt-az maps to boost research in segmentation, nowcasting, and autonomous observatory operations.