Intrinsic Dimension Estimation for Radio Galaxy Zoo using Diffusion Models

📅 2025-11-14

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

Accurately characterizing the intrinsic dimensionality (iD) of astronomical datasets—particularly Radio Galaxy Zoo (RGZ)—remains challenging, yet iD is critical for assessing representation quality, self-supervised learning efficacy, and anomaly detection. Method: We estimate iD using a score-based diffusion model and systematically analyze its relationships with Bayesian neural network (BNN) energy scores, Fanaroff-Riley (FR) morphological classes (FR I vs. FR II), and signal-to-noise ratio (SNR). Contribution/Results: We report the first empirical evidence that out-of-distribution radio sources exhibit significantly higher iD than in-distribution ones, and that RGZ’s overall iD exceeds that of natural image datasets. iD shows a strong negative correlation with BNN energy scores and a weak negative correlation with SNR; however, no statistically significant difference in iD is observed between FR I and FR II sources. This work establishes a novel paradigm for evaluating representation quality in astrophysical data and extends the theoretical interpretability and practical applicability of iD in self-supervised learning and anomaly detection.

Technology Category

Application Category

📝 Abstract

In this work, we estimate the intrinsic dimension (iD) of the Radio Galaxy Zoo (RGZ) dataset using a score-based diffusion model. We examine how the iD estimates vary as a function of Bayesian neural network (BNN) energy scores, which measure how similar the radio sources are to the MiraBest subset of the RGZ dataset. We find that out-of-distribution sources exhibit higher iD values, and that the overall iD for RGZ exceeds those typically reported for natural image datasets. Furthermore, we analyse how iD varies across Fanaroff-Riley (FR) morphological classes and as a function of the signal-to-noise ratio (SNR). While no relationship is found between FR I and FR II classes, a weak trend toward higher SNR at lower iD. Future work using the RGZ dataset could make use of the relationship between iD and energy scores to quantitatively study and improve the representations learned by various self-supervised learning algorithms.

Problem

Research questions and friction points this paper is trying to address.

Estimating intrinsic dimension of radio galaxy dataset using diffusion models

Analyzing how intrinsic dimension varies with energy scores and SNR

Studying intrinsic dimension differences between in-distribution and out-of-distribution sources

Innovation

Methods, ideas, or system contributions that make the work stand out.

Using diffusion models for intrinsic dimension estimation

Analyzing iD variation with Bayesian neural network energy scores

Investigating iD across morphological classes and signal-to-noise ratios

🔎 Similar Papers

Diffusion-HMC: Parameter Inference with Diffusion Model driven Hamiltonian Monte Carlo