TARSE: Test-Time Adaptation via Retrieval of Skills and Experience for Reasoning Agents

📅 2026-03-01

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

This work proposes a novel medical reasoning framework that addresses the limitations of current language models in complex clinical decision-making, where failures often stem from an inability to effectively access procedural knowledge and past cases. The approach formulates clinical question answering as an agent-based task and introduces an explicit yet unified retrieval mechanism that jointly accesses a structured skill repository—comprising guidelines and protocols—and an experience bank of verified reasoning trajectories. By integrating step-aware retrieval with a lightweight test-time adaptation module, the framework aligns the model’s intermediate reasoning steps with established clinical logic. Evaluated across multiple medical question-answering benchmarks, the method significantly outperforms strong retrieval-augmented generation (RAG) and prompt-only baselines, yielding improvements in accuracy, reliability, and traceability of clinical reasoning.

Technology Category

Application Category

📝 Abstract

Complex clinical decision making often fails not because a model lacks facts, but because it cannot reliably select and apply the right procedural knowledge and the right prior example at the right reasoning step. We frame clinical question answering as an agent problem with two explicit, retrievable resources: skills, reusable clinical procedures such as guidelines, protocols, and pharmacologic mechanisms; and experience, verified reasoning trajectories from previously solved cases (e.g., chain-of-thought solutions and their step-level decompositions). At test time, the agent retrieves both relevant skills and experiences from curated libraries and performs lightweight test-time adaptation to align the language model's intermediate reasoning with clinically valid logic. Concretely, we build (i) a skills library from guideline-style documents organized as executable decision rules, (ii) an experience library of exemplar clinical reasoning chains indexed by step-level transitions, and (iii) a step-aware retriever that selects the most useful skill and experience items for the current case. We then adapt the model on the retrieved items to reduce instance-step misalignment and to prevent reasoning from drifting toward unsupported shortcuts. Experiments on medical question-answering benchmarks show consistent gains over strong medical RAG baselines and prompting-only reasoning methods. Our results suggest that explicitly separating and retrieving clinical skills and experience, and then aligning the model at test time, is a practical approach to more reliable medical agents.

Problem

Research questions and friction points this paper is trying to address.

clinical reasoning

procedural knowledge

case experience

reasoning alignment

medical decision making

Innovation

Methods, ideas, or system contributions that make the work stand out.

Test-Time Adaptation

Retrieval-Augmented Reasoning

Clinical Skills