AstroLLaVA: towards the unification of astronomical data and natural language

📅 2025-04-11

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

This work addresses the challenge of unified modeling of astronomical images and natural language by introducing Astronomy-VLM, the first vision-language model specifically designed for astronomy. Methodologically, we propose a two-stage domain-specific fine-tuning paradigm: first aligning diverse authoritative astronomical image sources (e.g., NASA APOD, ESO, Hubble Space Telescope; ~30K samples) with descriptive text, then performing supervised fine-tuning and visual question answering (VQA) alignment based on the LLaVA architecture. Key contributions include: (1) constructing the first systematic, multi-modal astronomical QA dataset; (2) establishing a general framework for aligning pre-trained language models with full-modality astronomical data; (3) achieving significant performance gains over general-purpose VLMs on an astronomical VQA benchmark; and (4) open-sourcing the model weights, code, and data to establish a scalable baseline for astronomical multimodal understanding.

Technology Category

Application Category

📝 Abstract

We present AstroLLaVA, a vision language model for astronomy that enables interaction with astronomical imagery through natural dialogue. By fine-tuning the LLaVA model on a diverse dataset of $sim$30k images with captions and question-answer pairs sourced from NASA's `Astronomy Picture of the Day', the European Southern Observatory, and the NASA/ESA Hubble Space Telescope, we create a model capable of answering open-ended questions about astronomical concepts depicted visually. Our two-stage fine-tuning process adapts the model to both image captioning and visual question answering in the astronomy domain. We demonstrate AstroLLaVA's performance on an astronomical visual question answering benchmark and release the model weights, code, and training set to encourage further open source work in this space. Finally, we suggest a roadmap towards general astronomical data alignment with pre-trained language models, and provide an open space for collaboration towards this end for interested researchers.

Problem

Research questions and friction points this paper is trying to address.

Enables interaction with astronomical imagery via natural dialogue

Answers open-ended questions about visually depicted astronomical concepts

Aligns astronomical data with pre-trained language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tunes LLaVA on diverse astronomical datasets

Two-stage tuning for captioning and question answering

Releases model weights and code for open collaboration

🔎 Similar Papers

No similar papers found.