Combining datasets with different ground truths using Low-Rank Adaptation to generalize image-based CNN models for photometric redshift prediction

📅 2026-01-01

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

This work addresses the complementary limitations of photometric redshifts—broad coverage but low precision—and spectroscopic redshifts—high accuracy but sparse sampling—by proposing a novel fusion approach based on Low-Rank Adaptation (LoRA). Building upon a pre-trained convolutional neural network (CNN), the method efficiently fine-tunes spectroscopic redshift data with minimal computational overhead. To the best of our knowledge, this is the first application of LoRA to astronomical redshift regression, demonstrating marked improvements over conventional transfer learning: it reduces bias by approximately 2.5 times and scatter by about 2.2 times. The approach substantially enhances model generalization while maintaining computational efficiency, offering a new paradigm for astrophysical regression tasks under data sparsity.

Technology Category

Application Category

📝 Abstract

In this work, we demonstrate how Low-Rank Adaptation (LoRA) can be used to combine different galaxy imaging datasets to improve redshift estimation with CNN models for cosmology. LoRA is an established technique for large language models that adds adapter networks to adjust model weights and biases to efficiently fine-tune large base models without retraining. We train a base model using a photometric redshift ground truth dataset, which contains broad galaxy types but is less accurate. We then fine-tune using LoRA on a spectroscopic redshift ground truth dataset. These redshifts are more accurate but limited to bright galaxies and take orders of magnitude more time to obtain, so are less available for large surveys. Ideally, the combination of the two datasets would yield more accurate models that generalize well. The LoRA model performs better than a traditional transfer learning method, with $\sim2.5\times$ less bias and $\sim$2.2$\times$ less scatter. Retraining the model on a combined dataset yields a model that generalizes better than LoRA but at a cost of greater computation time. Our work shows that LoRA is useful for fine-tuning regression models in astrophysics by providing a middle ground between full retraining and no retraining. LoRA shows potential in allowing us to leverage existing pretrained astrophysical models, especially for data sparse tasks.

Problem

Research questions and friction points this paper is trying to address.

photometric redshift

spectroscopic redshift

dataset combination

CNN models

generalization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Low-Rank Adaptation

photometric redshift

CNN