🤖 AI Summary
This work addresses non-episodic sequential decision-making in mixed textual and numerical contexts by proposing the LLMP-UCB algorithm, which enables uncertainty-aware decisions while circumventing the high per-step inference cost of large language models (LLMs). The approach integrates a lightweight contextual multi-armed bandit with text embeddings—including Matryoshka representations—and leverages repeated LLM queries to estimate uncertainty. It introduces a novel diagnostic criterion based on the geometric structure of arm embeddings to dynamically determine when to invoke the LLM. Experiments across multiple financial tasks demonstrate that this lightweight framework achieves performance on par with or superior to full LLM-based decision-making at substantially lower computational cost, offering a deployable solution. Moreover, the study reveals that embedding dimensionality effectively modulates the exploration–exploitation trade-off.
📝 Abstract
We study Contextual Multi-Armed Bandits (CMABs) for non-episodic sequential decision making problems where the context includes both textual and numerical information (e.g., recommendation systems, dynamic portfolio adjustments, offer selection; all frequent problems in finance). While Large Language Models (LLMs) are increasingly applied to these settings, utilizing LLMs for reasoning at every decision step is computationally expensive and uncertainty estimates are difficult to obtain. To address this, we introduce LLMP-UCB, a bandit algorithm that derives uncertainty estimates from LLMs via repeated inference. However, our experiments demonstrate that lightweight numerical bandits operating on text embeddings (dense or Matryoshka) match or exceed the accuracy of LLM-based solutions at a fraction of their cost. We further show that embedding dimensionality is a practical lever on the exploration-exploitation balance, enabling cost--performance tradeoffs without prompt complexity. Finally, to guide practitioners, we propose a geometric diagnostic based on the arms' embedding to decide when to use LLM-driven reasoning versus a lightweight numerical bandit. Our results provide a principled deployment framework for cost-effective, uncertainty-aware decision systems with broad applicability across AI use cases in financial services.