GRAD: Generative Retrieval-Aligned Demonstration Sampler for Efficient Few-Shot Reasoning

๐Ÿ“… 2025-10-01
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Traditional RAG relies on static knowledge bases, resulting in low relevance and poor generalization of generated in-context examples. This paper introduces GRAD, a dynamic few-shot learning framework tailored for mathematics and STEM domains, which pioneers input-driven generative example sampling. GRAD employs a retrieval-alignment architecture coupled with token-budget control to efficiently produce concise, task-specific reasoning demonstrations. It enables small models to generate high-quality demonstrations that guide large-model inference, substantially reducing both training and inference overhead. Evaluated on Qwen2.5-14B, GRAD consistently outperforms strong baselines and achieves superior out-of-distribution (OOD) generalization across physics, chemistry, and computer scienceโ€”while maintaining low latency and high resource efficiency. Key innovations include input-adaptive demonstration generation, cross-distribution generalization capability, and a lightweight demonstration distillation mechanism.

Technology Category

Application Category

๐Ÿ“ Abstract
Large Language Models (LLMs) achieve strong performance across diverse tasks, but their effectiveness often depends on the quality of the provided context. Retrieval-Augmented Generation (RAG) enriches prompts with external information, but its reliance on static databases constrains adaptability and can result in irrelevant demonstrations. In this work, we propose a Generative Retrieval-Aligned Demonstrator (GRAD), a dynamic demonstration-based approach where an LLM model is trained to generate input-specific concise demonstrations. By tailoring demonstrations to each input, our method offers better contextual support than traditional RAG approaches. We demonstrate the superiority of GRAD under budget constraints, where we limit both the number of tokens used per demonstration and the number of tokens used for the final output. Trained solely on a math dataset, GRAD consistently outperforms strong baselines on Qwen2.5-14B across mathematical reasoning and advanced STEM questions, highlighting GRAD's robust generalization to out-of-distribution (OOD) domains such as physics, chemistry, and computer science. Furthermore, we show that demonstrations generated by trained smaller models can effectively guide larger target models, reducing training costs while maintaining competitive accuracy. Overall, this work introduces a scalable demonstration generator model presenting the first step toward a dynamic few-shot learning paradigm in resource-constrained settings. We release the code used for the project.
Problem

Research questions and friction points this paper is trying to address.

Dynamic demonstration generation for few-shot reasoning tasks
Overcoming static database limitations in retrieval-augmented generation
Improving contextual support under token budget constraints
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates input-specific demonstrations dynamically
Outperforms baselines under token budget constraints
Uses smaller models to guide larger models efficiently
๐Ÿ”Ž Similar Papers
No similar papers found.
O
Oussama Gabouj
EPFL, Lausanne, Switzerland
K
Kamel Charaf
EPFL, Lausanne, Switzerland
Ivan Zakazov
Ivan Zakazov
EPFL
natural language processingmedical imaging
N
N. Baldwin
EPFL, Lausanne, Switzerland
R
Robert West
EPFL, Lausanne, Switzerland