Adaptive ToR: Complexity-Aware Tree-Based Retrieval for Pareto-Optimal Multi-Intent NLU

📅 2026-04-27

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

This work addresses the challenge of balancing accuracy and efficiency in multi-intent natural language understanding, where existing retrieval methods suffer from insufficient recall in single-step retrieval or redundant computation in fixed-depth hierarchical approaches. The paper proposes Adaptive Tree-of-Retrieval (Adaptive ToR), a novel architecture that introduces query-complexity-aware routing to dynamically select between single-step and hierarchical retrieval paths. It integrates recursive decomposition, two-stage pruning—combining similarity filtering and semantic relevance scoring—and an LLM-based reranker prioritizing deduplication. Evaluated on the NLU++ benchmark, the method achieves 29.07% Subset Accuracy and 71.79% Micro-F1, yielding a 9.7% relative improvement over baselines while reducing latency by 37.6%, LLM invocations by 43.0%, and token consumption by 9.8%, thereby attaining a Pareto-optimal trade-off among accuracy, latency, and resource usage.

Technology Category

Application Category

📝 Abstract

Multi-intent natural language understanding requires retrieval systems that simultaneously achieve high accuracy and computational efficiency, yet existing approaches apply either uniform single-step retrieval that compromises recall or fixed-depth hierarchical decomposition that introduces excessive latency regardless of query complexity. This paper proposes Adaptive Tree-of-Retrieval (Adaptive ToR), a complexity-aware retrieval architecture that dynamically configures retrieval topology based on query characteristics. The system integrates four components: (1) a Query Tree Classifier computing a Query Complexity Index from weighted linguistic signals to route queries to either a rapid single-step path or an adaptive-depth hierarchical path; (2) a Tree-Based Retrieval module that recursively decomposes complex queries into focused sub-queries calibrated to predicted complexity; (3) an Adaptive Pruning Module employing two-stage filtering combining quantitative similarity gating with semantic relevance evaluation to suppress exponential node growth; and (4) a Retrieval Reranking Layer featuring a deduplicator-first pipeline and global LLM rescoring for production efficiency. Evaluation on the NLU++ benchmark (2,693 multi-intent queries across Banking and Hotel domains) yields 29.07% Subset Accuracy and 71.79% Micro-F1, a 9.7% relative improvement over fixed-depth baselines, while reducing latency by 37.6%, LLM invocations by 43.0%, and token consumption by 9.8%. Depth-wise analysis reveals that 26.92% of queries resolve within three seconds (2.45s mean latency) via single-step routing (d=0: 37.9% Subset Accuracy, 74.8% Micro-F1), while token consumption scales by 4.9x across depths, validating complexity-aware resource allocation and establishing Pareto-optimal balance across accuracy, latency, and computational efficiency.

Problem

Research questions and friction points this paper is trying to address.

multi-intent NLU

retrieval efficiency

query complexity

computational latency

Pareto-optimal balance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive ToR

complexity-aware retrieval

tree-based decomposition