🤖 AI Summary
This work addresses the inefficiency and high cost of traditional model cascades in processing unstructured documents, which often neglect document locality and task relevance. To overcome these limitations, the authors propose a task cascade framework that extends conventional model cascading into a dynamic task-level paradigm. The approach leverages large language model (LLM) agents to intelligently select relevant document segments, decompose the overall task, and schedule subtasks adaptively. Furthermore, it incorporates a statistical accuracy guarantee mechanism constrained by subtask failure probabilities. Evaluated across eight real-world document processing tasks, the method achieves comparable performance—maintaining 90% of the target accuracy—while reducing end-to-end inference costs by 36% on average, substantially outperforming traditional model cascade strategies.
📝 Abstract
Modern database systems allow users to query or process unstructured text or document columns using LLM-powered functions. Users can express an operation in natural language (e.g.,"identify if this review mentions billing issues"), with the system executing the operation on each document, in a row-by-row fashion. One way to reduce cost on a batch of documents is to employ the model cascade framework: a cheap proxy model processes each document, and only uncertain cases are escalated to a more accurate, expensive oracle. However, model cascades miss important optimization opportunities; for example, often only part of a document is needed to answer a query, or other related, but simpler operations (e.g.,"is the review sentiment negative?","does the review mention money?") can be handled by cheap models more effectively than the original operation, while still being correlated with it. We introduce the task cascades framework, which generalizes model cascades by varying not just the model, but also the document portion and operation at each stage. Our framework uses an LLM agent to generate simplified, decomposed, or otherwise related operations and selects the most relevant document portions, constructing hundreds of candidate tasks from which it assembles a task cascade. We show that optimal cascade selection is intractable via reduction from Minimum Sum Set Cover, but our iterative approach constructs effective cascades. We also provide an extension that offers statistical accuracy guarantees: the resulting cascade meets a user-defined accuracy target (with respect to the oracle) up to a bounded failure probability. Across eight real-world document processing tasks at a 90% target accuracy, task cascades reduce end-to-end cost by an average of 36% compared to model cascades, at a production scale.