Radio Astronomy in the Era of Vision-Language Models: Prompt Sensitivity and Adaptation

📅 2025-08-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study evaluates the zero-shot generalization capability of general-purpose vision-language models (VLMs) for radio galaxy morphology classification (FR-I vs. FR-II) without astronomical pretraining. Addressing domain distribution shift and prompt sensitivity in scientific imaging, we introduce visual context examples—novel to astronomy—for prompt engineering and systematically analyze the effects of natural language prompts, schematic guidance, and LoRA-based fine-tuning. Results show that raw VLM outputs are highly sensitive to prompt formulation; however, lightweight LoRA fine-tuning (15M parameters) of Qwen-VL reduces classification error to 3%, matching the performance of domain-specific models. Our work demonstrates the strong transfer potential of general VLMs for few-shot scientific image understanding and proposes a new efficient adaptation paradigm: “prompt robustness optimization + minimal fine-tuning.”

Technology Category

Application Category

📝 Abstract
Vision-Language Models (VLMs), such as recent Qwen and Gemini models, are positioned as general-purpose AI systems capable of reasoning across domains. Yet their capabilities in scientific imaging, especially on unfamiliar and potentially previously unseen data distributions, remain poorly understood. In this work, we assess whether generic VLMs, presumed to lack exposure to astronomical corpora, can perform morphology-based classification of radio galaxies using the MiraBest FR-I/FR-II dataset. We explore prompting strategies using natural language and schematic diagrams, and, to the best of our knowledge, we are the first to introduce visual in-context examples within prompts in astronomy. Additionally, we evaluate lightweight supervised adaptation via LoRA fine-tuning. Our findings reveal three trends: (i) even prompt-based approaches can achieve good performance, suggesting that VLMs encode useful priors for unfamiliar scientific domains; (ii) however, outputs are highly unstable, i.e. varying sharply with superficial prompt changes such as layout, ordering, or decoding temperature, even when semantic content is held constant; and (iii) with just 15M trainable parameters and no astronomy-specific pretraining, fine-tuned Qwen-VL achieves near state-of-the-art performance (3% Error rate), rivaling domain-specific models. These results suggest that the apparent "reasoning" of VLMs often reflects prompt sensitivity rather than genuine inference, raising caution for their use in scientific domains. At the same time, with minimal adaptation, generic VLMs can rival specialized models, offering a promising but fragile tool for scientific discovery.
Problem

Research questions and friction points this paper is trying to address.

Assessing VLMs' ability to classify radio galaxy morphologies
Evaluating prompt sensitivity and stability in scientific applications
Exploring lightweight adaptation methods for astronomy tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Using natural language and schematic prompts
Introducing visual in-context examples in astronomy
Lightweight supervised adaptation via LoRA fine-tuning
🔎 Similar Papers
No similar papers found.
M
Mariia Drozdova
University of Geneva, Switzerland
E
Erica Lastufka
University of Geneva, Switzerland
Vitaliy Kinakh
Vitaliy Kinakh
University of Geneva
Deep LearningMachine LearningApplied MathematicsArtificial intelligence
Taras Holotyak
Taras Holotyak
Reseacher at Department of Computer Sciences, University of Geneva
D
Daniel Schaerer
University of Geneva, Switzerland
S
Slava Voloshynovskiy
University of Geneva, Switzerland