THREAD: Thinking Deeper with Recursive Spawning

📅 2024-05-27
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
To address the significant degradation in reasoning performance of large language models (LLMs) on long-context and high-complexity tasks, this paper proposes the Dynamic Recursive Threading (DRT) mechanism. DRT models inference as an executable flow that dynamically spawns lightweight subthreads on demand, enabling adaptive task decomposition, parallel information retrieval, and result aggregation. Departing from conventional single-chain reasoning paradigms, DRT supports dynamic scaling of intermediate computational effort and implements a model-agnostic, few-shot prompt-based threading framework compatible with GPT-4/3.5, Llama-3, and CodeLlama. Evaluated on ALFWorld, TextCraft, WebShop, and two newly constructed benchmarks—DataCommons QA and MIMIC-III ICU QA—DRT achieves state-of-the-art (SOTA) performance. Notably, it delivers absolute accuracy gains of 10–50 percentage points on medium- and small-scale models.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) have shown impressive capabilities across diverse settings, but still struggle as the length and complexity of the context increases. To address this challenge, we propose Thinking Recursively and Dynamically (ThReaD). THREAD frames model generation as a thread of execution that, based on the context, can run to completion or dynamically spawn new threads. By spawning, threads can offload work (e.g., thinking, retrieving information) to child threads, which only return tokens needed for the parent thread to do its work. In effect, this enables the model to adapt, as needed, the amount of intermediate work used to produce tokens. We apply THREAD in the settings of LLM task solving and question answering, where the dynamic threading allows the model to recursively decompose the given task or question into progressively simpler sub-problems that can be solved by separate child threads. We test THREAD, implemented using a few-shot learning approach, on diverse benchmarks for agent tasks and data-grounded question answering. THREAD achieves state-of-the-art performance with GPT-4 and GPT-3.5 on these benchmarks, including ALFWorld, TextCraft, and WebShop, along with two new benchmarks, DataCommons QA and MIMIC-III ICU QA. In addition, THREAD outperforms existing frameworks by 10% to 50% absolute points with smaller models, including Llama-3-8b and CodeLlama-7b.
Problem

Research questions and friction points this paper is trying to address.

Improving LLM performance on long, complex contexts
Recursively decomposing tasks into simpler sub-problems
Enhancing task solving and question answering accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Recursive thread spawning for dynamic workload
Decomposing tasks into simpler sub-problems
Few-shot learning with adaptive intermediate work
🔎 Similar Papers
No similar papers found.