Ada-RS: Adaptive Rejection Sampling for Selective Thinking

📅 2026-02-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the inefficiency of large language models (LLMs) that often expend excessive computational resources on over-reasoning for simple tasks, struggling to balance efficiency and accuracy. To tackle this, we propose Ada-RS, an adaptive rejection sampling framework that introduces rejection sampling into reasoning path selection for the first time. Ada-RS evaluates multiple sampled paths using a length-penalized reward and stochastically rejects low-value paths, retaining only high-quality candidates for downstream preference optimization. This approach dynamically balances reasoning depth and computational efficiency while remaining compatible with preference alignment algorithms such as DPO and DAPO, and integrates seamlessly with LoRA-based fine-tuning. Experiments on Qwen3-8B demonstrate that Ada-RS reduces output token usage by up to 80% and cuts reasoning overhead by up to 95%, all while maintaining or even improving tool-calling accuracy.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) are increasingly being deployed in cost and latency-sensitive settings. While chain-of-thought improves reasoning, it can waste tokens on simple requests. We study selective thinking for tool-using LLMs and introduce Adaptive Rejection Sampling (Ada-RS), an algorithm-agnostic sample filtering framework for learning selective and efficient reasoning. For each given context, Ada-RS scores multiple sampled completions with an adaptive length-penalized reward then applies stochastic rejection sampling to retain only high-reward candidates (or preference pairs) for downstream optimization. We demonstrate how Ada-RS plugs into both preference pair (e.g. DPO) or grouped policy optimization strategies (e.g. DAPO). Using Qwen3-8B with LoRA on a synthetic tool call-oriented e-commerce benchmark, Ada-RS improves the accuracy-efficiency frontier over standard algorithms by reducing average output tokens by up to 80% and reducing thinking rate by up to 95% while maintaining or improving tool call accuracy. These results highlight that training-signal selection is a powerful lever for efficient reasoning in latency-sensitive deployments.
Problem

Research questions and friction points this paper is trying to address.

selective thinking
efficient reasoning
large language models
latency-sensitive deployment
token efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive Rejection Sampling
Selective Thinking
Efficient Reasoning
Preference Optimization
Token Efficiency
🔎 Similar Papers
No similar papers found.
Y
Yirou Ge
PayPal AI
Y
Yixi Li
PayPal AI
A
Alec Chiu
PayPal AI
S
Shivani Shekhar
PayPal AI
Zijie Pan
Zijie Pan
Univeristy of Connecticut
Machine LearningDeep LearningGraph Neural NetworksTime Series
A
Avinash Thangali
PayPal AI
Y
Yun-Shiuan Chuang
PayPal AI
C
Chaitanya Kulkarni
PayPal AI
U
Uma Kona
PayPal AI
L
Linsey Pang
PayPal AI
P
Prakhar Mehrotra
PayPal AI