🤖 AI Summary
Galaxy morphology analysis faces a fundamental trade-off between computational efficiency and predictive performance: fully supervised training is prohibitively expensive, while few-shot fine-tuning suffers from insufficient accuracy. To address this, we propose GalaxAlign, the first tri-modal contrastive alignment framework integrating images, symbolic schematic diagrams, and textual descriptions. GalaxAlign explicitly injects domain-specific astronomical priors—encoded as interpretable, human-readable schematic diagrams—into a vision-language model (ViT + CLIP) via end-to-end joint embedding optimization. This design simultaneously preserves zero-shot generalization capability and enhances supervised performance. Experiments demonstrate that GalaxAlign achieves an 8.2% absolute accuracy gain over standard fine-tuning on both classification and similarity retrieval tasks. Remarkably, it attains full-supervision-level performance using only 10% of labeled data and accelerates inference by 3×. The approach thus enables lightweight, interpretable, and high-fidelity domain adaptation for astronomical image analysis.
📝 Abstract
Galaxy morphology analysis involves classifying galaxies by their shapes and structures. For this task, directly training domain-specific models on large, annotated astronomical datasets is effective but costly. In contrast, fine-tuning vision foundation models on a smaller set of astronomical images is more resource-efficient but generally results in lower accuracy. To harness the benefits of both approaches and address their shortcomings, we propose GalaxAlign, a novel method that fine-tunes pre-trained foundation models to achieve high accuracy on astronomical tasks. Specifically, our method extends a contrastive learning architecture to align three types of data in fine-tuning: (1) a set of schematic symbols representing galaxy shapes and structures, (2) textual labels of these symbols, and (3) galaxy images. This way, GalaxAlign not only eliminates the need for expensive pretraining but also enhances the effectiveness of fine-tuning. Extensive experiments on galaxy classification and similarity search demonstrate that our method effectively fine-tunes general pre-trained models for astronomical tasks by incorporating domain-specific multi-modal knowledge.