🤖 AI Summary
This work addresses the challenge of high token costs and stringent latency constraints faced by large language models when processing lengthy clinical texts. The authors formulate budget-constrained context selection as a subset selection problem with knapsack constraints and introduce the RCD objective function to jointly optimize relevance, coverage, and diversity. They further propose an adaptive budget-aware routing heuristic and systematically evaluate various document chunking strategies—including sentences, sections, sliding windows, and clustering—alongside submodular optimization and MMR-based diversity methods. Experiments on MIMIC, Cochrane, and L-Eval benchmarks show that positional heuristics outperform alternatives under tight budgets, diversity-aware strategies substantially improve generation quality, and BERTScore is more sensitive than ROUGE in capturing performance differences. Notably, the design of the context selector exerts a far greater impact on effectiveness than the choice of chunking method.
📝 Abstract
A key challenge for large language models is token cost per query and overall deployment cost. Clinical inputs are long, heterogeneous, and often redundant, while downstream tasks are short and high stakes. We study budgeted context selection, where a subset of document units is chosen under a strict token budget so an off-the-shelf generator can meet fixed cost and latency constraints. We cast this as a knapsack-constrained subset selection problem with two design choices, unitization that defines document segmentation and selection that determines which units are kept.
We propose \textbf{RCD}, a monotone submodular objective that balances relevance, coverage, and diversity. We compare sentence, section, window, and cluster-based unitization, and introduce a routing heuristic that adapts to the budget regime. Experiments on MIMIC discharge notes, Cochrane abstracts, and L-Eval show that optimal strategies depend on the evaluation setting. Positional heuristics perform best at low budgets in extractive tasks, while diversity-aware methods such as MMR improve LLM generation. Selector choice matters more than unitization, with cluster-based grouping reducing performance and other schemes behaving similarly. ROUGE saturates for LLM summaries, while BERTScore better reflects quality differences. We release our code at https://github.com/stone-technologies/ACL_budget_paper.