Few-Shot Vision-Language Reasoning for Satellite Imagery via Verifiable Rewards

📅 2025-07-29

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

To address the scarcity of labeled data and high annotation costs in remote sensing—hindering the practical deployment of vision-language models—this paper proposes a few-shot vision-language reasoning framework grounded in verifiable rewards. Our method pioneers the integration of verifiable-reward reinforcement learning into satellite image understanding, replacing dense human annotations with lightweight, computationally tractable reward signals (e.g., IoU or binary logical constraints). Model alignment is achieved efficiently via policy gradient optimization, eliminating the need for fine-tuning large-scale parameters. Remarkably, the framework surpasses fully supervised baselines using only one annotated example, and matches the performance of thousands of supervised samples with just 128 examples. Extensive evaluation across multiple remote sensing benchmarks demonstrates exceptional data efficiency, strong generalization, and high potential for real-world deployment.

Technology Category

Application Category

📝 Abstract

Recent advances in large language and vision-language models have enabled strong reasoning capabilities, yet they remain impractical for specialized domains like remote sensing, where annotated data is scarce and expensive. We present the first few-shot reinforcement learning with verifiable reward (RLVR) framework for satellite imagery that eliminates the need for caption supervision--relying solely on lightweight, rule-based binary or IoU-based rewards. Adapting the "1-shot RLVR" paradigm from language models to vision-language models, we employ policy-gradient optimization with as few as one curated example to align model outputs for satellite reasoning tasks. Comprehensive experiments across multiple remote sensing benchmarks--including classification, visual question answering, and grounding--show that even a single example yields substantial improvements over the base model. Scaling to 128 examples matches or exceeds models trained on thousands of annotated samples. While the extreme one-shot setting can induce mild, task-specific overfitting, our approach consistently demonstrates robust generalization and efficiency across diverse tasks. Further, we find that prompt design and loss weighting significantly influence training stability and final accuracy. Our method enables cost-effective and data-efficient development of domain-specialist vision-language reasoning models, offering a pragmatic recipe for data-scarce fields: start from a compact VLM, curate a handful of reward-checkable cases, and train via RLVR.

Problem

Research questions and friction points this paper is trying to address.

Enables few-shot vision-language reasoning for satellite imagery

Eliminates need for annotated data using verifiable rewards

Improves performance with minimal curated examples

Innovation

Methods, ideas, or system contributions that make the work stand out.

Few-shot reinforcement learning with verifiable rewards

Policy-gradient optimization with one example

Lightweight rule-based binary or IoU rewards

🔎 Similar Papers

ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling