Select-then-Solve: Paradigm Routing as Inference-Time Optimization for LLM Agents

📅 2026-04-08

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

This work addresses the significant performance fluctuations of large language models across diverse tasks due to their reliance on fixed reasoning paradigms and inability to adaptively select optimal strategies. To overcome this limitation, the study formulates reasoning paradigm selection as a task-level learned routing problem and introduces a lightweight embedded router that dynamically chooses among six paradigms—Direct, Chain-of-Thought (CoT), ReAct, Plan-Execute, Reflection, and ReCode—based on the input task. Experimental results demonstrate that the proposed approach improves average accuracy from 47.6% to 53.1% across four prominent large language models, outperforming the best fixed paradigm by 2.8 percentage points and closing up to 37% of the gap to the oracle performance.

Technology Category

Application Category

📝 Abstract

When an LLM-based agent improves on a task, is the gain from the model itself or from the reasoning paradigm wrapped around it? We study this question by comparing six inference-time paradigms, namely Direct, CoT, ReAct, Plan-Execute, Reflection, and ReCode, across four frontier LLMs and ten benchmarks, yielding roughly 18,000 runs. We find that reasoning structure helps dramatically on some tasks but hurts on others: ReAct improves over Direct by 44pp on GAIA, while CoT degrades performance by 15pp on HumanEval. No single paradigm dominates, and oracle per-task selection beats the best fixed paradigm by 17.1pp on average. Motivated by this complementarity, we propose a select-then-solve approach: before answering each task, a lightweight embedding-based router selects the most suitable paradigm. Across four models, the router improves average accuracy from 47.6% to 53.1%, outperforming the best fixed paradigm at 50.3% by 2.8pp and recovering up to 37% of the oracle gap. In contrast, zero-shot self-routing only works for GPT-5 at 67.1% and fails for weaker models, all trailing the learned router. Our results argue that reasoning paradigm selection should be a per-task decision made by a learned router, not a fixed architectural choice.

Problem

Research questions and friction points this paper is trying to address.

reasoning paradigm

LLM agents

paradigm selection

inference-time optimization

task-specific routing

Innovation

Methods, ideas, or system contributions that make the work stand out.

paradigm routing

inference-time optimization

LLM agents