Radio Astronomy in the Era of Vision-Language Models: Prompt Sensitivity and Adaptation

📅 2025-08-31

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

This study evaluates the zero-shot generalization capability of general-purpose vision-language models (VLMs) for radio galaxy morphology classification (FR-I vs. FR-II) without astronomical pretraining. Addressing domain distribution shift and prompt sensitivity in scientific imaging, we introduce visual context examples—novel to astronomy—for prompt engineering and systematically analyze the effects of natural language prompts, schematic guidance, and LoRA-based fine-tuning. Results show that raw VLM outputs are highly sensitive to prompt formulation; however, lightweight LoRA fine-tuning (15M parameters) of Qwen-VL reduces classification error to 3%, matching the performance of domain-specific models. Our work demonstrates the strong transfer potential of general VLMs for few-shot scientific image understanding and proposes a new efficient adaptation paradigm: “prompt robustness optimization + minimal fine-tuning.”

Technology Category

Application Category

📝 Abstract

Vision-Language Models (VLMs), such as recent Qwen and Gemini models, are positioned as general-purpose AI systems capable of reasoning across domains. Yet their capabilities in scientific imaging, especially on unfamiliar and potentially previously unseen data distributions, remain poorly understood. In this work, we assess whether generic VLMs, presumed to lack exposure to astronomical corpora, can perform morphology-based classification of radio galaxies using the MiraBest FR-I/FR-II dataset. We explore prompting strategies using natural language and schematic diagrams, and, to the best of our knowledge, we are the first to introduce visual in-context examples within prompts in astronomy. Additionally, we evaluate lightweight supervised adaptation via LoRA fine-tuning. Our findings reveal three trends: (i) even prompt-based approaches can achieve good performance, suggesting that VLMs encode useful priors for unfamiliar scientific domains; (ii) however, outputs are highly unstable, i.e. varying sharply with superficial prompt changes such as layout, ordering, or decoding temperature, even when semantic content is held constant; and (iii) with just 15M trainable parameters and no astronomy-specific pretraining, fine-tuned Qwen-VL achieves near state-of-the-art performance (3% Error rate), rivaling domain-specific models. These results suggest that the apparent "reasoning" of VLMs often reflects prompt sensitivity rather than genuine inference, raising caution for their use in scientific domains. At the same time, with minimal adaptation, generic VLMs can rival specialized models, offering a promising but fragile tool for scientific discovery.

Problem

Research questions and friction points this paper is trying to address.

Assessing VLMs' ability to classify radio galaxy morphologies

Evaluating prompt sensitivity and stability in scientific applications

Exploring lightweight adaptation methods for astronomy tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Using natural language and schematic prompts

Introducing visual in-context examples in astronomy

Lightweight supervised adaptation via LoRA fine-tuning

🔎 Similar Papers

RadCLIP: Enhancing Radiologic Image Analysis through Contrastive Language-Image Pre-training

2024-03-15arXiv.orgCitations: 0

Personalized Lip Reading: Adapting to Your Unique Lip Movements with Vision and Language

2024-09-02arXiv.orgCitations: 0