From Few to Many: Self-Improving Many-Shot Reasoners Through Iterative Optimization and Generation

📅 2025-02-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates key factors for improving reasoning performance in many-shot in-context learning (ICL) for large language models (LLMs), revealing that example quality substantially outweighs quantity—performance gains are predominantly driven by a small set of high-impact examples. To address this, we propose BRIDGE: the first closed-loop ICL optimization framework that tightly integrates Bayesian optimization with example regeneration. BRIDGE iteratively evaluates example influence, identifies critical demonstration samples, and synthesizes enhanced reasoning paths. The method is compatible with long-context models (e.g., Gemini, Claude, Mistral) and requires no model fine-tuning. Empirically, BRIDGE achieves significant improvements over strong baselines across symbolic reasoning, numerical reasoning, and code generation tasks. It demonstrates robust generalization across diverse model scales and architectures. By enabling efficient, interpretable, and sample-efficient ICL optimization, BRIDGE establishes a novel paradigm for principled demonstration selection and refinement.

Technology Category

Application Category

📝 Abstract
Recent advances in long-context large language models (LLMs) have led to the emerging paradigm of many-shot in-context learning (ICL), where it is observed that scaling many more demonstrating examples beyond the conventional few-shot setup in the context can lead to performance benefits. However, despite its promise, it is unclear what aspects dominate the benefits and whether simply scaling to more examples is the most effective way of improving many-shot ICL. In this work, we first provide an analysis of the factors driving many-shot ICL, and we find that 1) many-shot performance can still be attributed to often a few disproportionately influential examples and 2) identifying such influential examples ("optimize") and using them as demonstrations to regenerate new examples ("generate") can lead to further improvements. Inspired by the findings, we propose BRIDGE, an algorithm that alternates between the optimize step with Bayesian optimization to discover the influential sets of examples and the generate step to reuse this set to expand the reasoning paths of the examples back to the many-shot regime automatically. On Gemini, Claude, and Mistral LLMs of different sizes, we show that BRIDGE to significant improvements across a diverse set of tasks, including symbolic reasoning, numerical reasoning, and code generation.
Problem

Research questions and friction points this paper is trying to address.

Multi-instance Learning
Learning Effectiveness
Number of Examples
Innovation

Methods, ideas, or system contributions that make the work stand out.

BRIDGE algorithm
multi-instance learning
enhanced language model performance
Xingchen Wan
Xingchen Wan
Google
H
Han Zhou
Google Cloud AI Research, University of Cambridge
R
Ruoxi Sun
Google DeepMind
Hootan Nakhost
Hootan Nakhost
Doctor, Google
Machine Learning
K
Ke Jiang
Google Cloud AI Research
S
Sercan Ö. Arik
Google Cloud AI Research