Effective Fine-Tuning of Vision-Language Models for Accurate Galaxy Morphology Analysis

📅 2024-11-29

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

🤖 AI Summary

Galaxy morphology analysis faces a fundamental trade-off between computational efficiency and predictive performance: fully supervised training is prohibitively expensive, while few-shot fine-tuning suffers from insufficient accuracy. To address this, we propose GalaxAlign, the first tri-modal contrastive alignment framework integrating images, symbolic schematic diagrams, and textual descriptions. GalaxAlign explicitly injects domain-specific astronomical priors—encoded as interpretable, human-readable schematic diagrams—into a vision-language model (ViT + CLIP) via end-to-end joint embedding optimization. This design simultaneously preserves zero-shot generalization capability and enhances supervised performance. Experiments demonstrate that GalaxAlign achieves an 8.2% absolute accuracy gain over standard fine-tuning on both classification and similarity retrieval tasks. Remarkably, it attains full-supervision-level performance using only 10% of labeled data and accelerates inference by 3×. The approach thus enables lightweight, interpretable, and high-fidelity domain adaptation for astronomical image analysis.

Technology Category

Application Category

📝 Abstract

Galaxy morphology analysis involves classifying galaxies by their shapes and structures. For this task, directly training domain-specific models on large, annotated astronomical datasets is effective but costly. In contrast, fine-tuning vision foundation models on a smaller set of astronomical images is more resource-efficient but generally results in lower accuracy. To harness the benefits of both approaches and address their shortcomings, we propose GalaxAlign, a novel method that fine-tunes pre-trained foundation models to achieve high accuracy on astronomical tasks. Specifically, our method extends a contrastive learning architecture to align three types of data in fine-tuning: (1) a set of schematic symbols representing galaxy shapes and structures, (2) textual labels of these symbols, and (3) galaxy images. This way, GalaxAlign not only eliminates the need for expensive pretraining but also enhances the effectiveness of fine-tuning. Extensive experiments on galaxy classification and similarity search demonstrate that our method effectively fine-tunes general pre-trained models for astronomical tasks by incorporating domain-specific multi-modal knowledge.

Problem

Research questions and friction points this paper is trying to address.

Improving galaxy classification accuracy with multimodal fine-tuning

Reducing reliance on expensive pretraining for morphology analysis

Enhancing similarity search using schematic symbols and text

Innovation

Methods, ideas, or system contributions that make the work stand out.

Tri-modal alignment framework for galaxy analysis

Fine-tunes pre-trained models with multimodal instructions

Incorporates schematic symbols, text labels and images

🔎 Similar Papers

No similar papers found.

Authors to Follow