AstroLLaVA: towards the unification of astronomical data and natural language

📅 2025-04-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of unified modeling of astronomical images and natural language by introducing Astronomy-VLM, the first vision-language model specifically designed for astronomy. Methodologically, we propose a two-stage domain-specific fine-tuning paradigm: first aligning diverse authoritative astronomical image sources (e.g., NASA APOD, ESO, Hubble Space Telescope; ~30K samples) with descriptive text, then performing supervised fine-tuning and visual question answering (VQA) alignment based on the LLaVA architecture. Key contributions include: (1) constructing the first systematic, multi-modal astronomical QA dataset; (2) establishing a general framework for aligning pre-trained language models with full-modality astronomical data; (3) achieving significant performance gains over general-purpose VLMs on an astronomical VQA benchmark; and (4) open-sourcing the model weights, code, and data to establish a scalable baseline for astronomical multimodal understanding.

Technology Category

Application Category

📝 Abstract
We present AstroLLaVA, a vision language model for astronomy that enables interaction with astronomical imagery through natural dialogue. By fine-tuning the LLaVA model on a diverse dataset of $sim$30k images with captions and question-answer pairs sourced from NASA's `Astronomy Picture of the Day', the European Southern Observatory, and the NASA/ESA Hubble Space Telescope, we create a model capable of answering open-ended questions about astronomical concepts depicted visually. Our two-stage fine-tuning process adapts the model to both image captioning and visual question answering in the astronomy domain. We demonstrate AstroLLaVA's performance on an astronomical visual question answering benchmark and release the model weights, code, and training set to encourage further open source work in this space. Finally, we suggest a roadmap towards general astronomical data alignment with pre-trained language models, and provide an open space for collaboration towards this end for interested researchers.
Problem

Research questions and friction points this paper is trying to address.

Enables interaction with astronomical imagery via natural dialogue
Answers open-ended questions about visually depicted astronomical concepts
Aligns astronomical data with pre-trained language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tunes LLaVA on diverse astronomical datasets
Two-stage tuning for captioning and question answering
Releases model weights and code for open collaboration
🔎 Similar Papers
No similar papers found.
S
Sharaf Zaman
UniverseTBD
Michael J. Smith
Michael J. Smith
UniverseTBD
P
P. Khetarpal
UniverseTBD, Indian Institute of Technology Delhi
R
Rishabh Chakrabarty
UniverseTBD, Intelligent Internet Inc.
M
Michele Ginolfi
UniverseTBD, University of Florence
M
M. Huertas-Company
Instituto de Astrofísica de Canarias (IAC), Departamento Astrofísica, Universidad de la Laguna, Observatoire de Paris, LERMA, PSL University, Université Paris-Cité
M
Maja Jablo'nska
UniverseTBD, ANU RSAA
Sandor Kruk
Sandor Kruk
European Space Agency
AstronomyArtificial IntelligenceData Science
M
Matthieu Le Lain
IRISA, Université Bretagne Sud
S
S. J. R. M'endez
UniverseTBD, ANU School of Computing
D
Dimitrios Tanoglidis
UniverseTBD