🤖 AI Summary
This work addresses the challenge of extracting critical attributes—such as treatment type and concomitant medications—from unstructured descriptions in clinical trial tables, a task where existing large language models often err under implicit reasoning. The authors propose SCOPE, a novel framework that formulates this problem explicitly as a table understanding task and introduces a multi-LLM collaborative planning mechanism. SCOPE decomposes reasoning into three distinct phases: row selection, structured planning, and execution, explicitly identifying source fields, rules, and constraints prior to generation. Evaluated on 1,500 oncology-related mixed-query questions, SCOPE significantly outperforms strong baselines—including zero-shot prompting, chain-of-thought reasoning, and TableGPT2—demonstrating that explicit planning enhances both accuracy and efficiency in structured clinical data extraction.
📝 Abstract
We study clinical trial table reasoning, where answers are not directly stored in visible cells but must be reasoned from semantic understanding through normalization, classification, extraction, or lightweight domain reasoning. Motivated by the observation that current LLM approaches often suffer from "bad reasoning" under implicit planning assumptions, we focus on settings in which the model must recover implicit attributes such as therapy type, added agents, endpoint roles, or follow-up status from partially observed clinical-trial tables. We propose SCOPE (Structured Clinical hybrid Planning for Evidence retrieval in clinical trials), a multi-LLM planner-based framework that decomposes the task into row selection, structured planning, and execution. The planner makes the source field, reasoning rules, and output constraints explicit before answer generation, reducing ambiguity relative to direct prompting. We evaluate SCOPE on 1,500 hybrid reasoning questions over oncology clinical-trial tables against zero-shot, few-shot, chain-of-thought, TableGPT2, Blend-SQL, and EHRAgent. Results show that explicit multi-LLM planning improves accuracy for reasoning-based questions while offering a stronger accuracy-efficiency tradeoff than heavier agentic baselines. Our findings position clinical trial reasoning as a distinct table understanding problem and highlight hybrid planner-based decomposition as an effective solution