π€ AI Summary
This work proposes PRISM, a novel framework that addresses the limitations of large language models (LLMs) in high-stakes domains such as finance and healthcare, where noisy probability estimates and opaque decision-making hinder reliable deployment. PRISM introduces Shapley values into LLM-based probabilistic prediction for the first time, enabling factor-level interpretability by quantifying the marginal contribution of each input feature. Leveraging these contributions, the framework recalibrates the modelβs output probabilities to enhance both accuracy and transparency. Empirical evaluations across finance, healthcare, and agriculture demonstrate that PRISM significantly outperforms standard prompting and other baseline methods. Furthermore, by visualizing the distribution of feature contributions, PRISM fosters greater user trust in model decisions without compromising predictive performance.
π Abstract
Large Language Models (LLMs) demonstrate potential to estimate the probability of uncertain events, by leveraging their extensive knowledge and reasoning capabilities. This ability can be applied to support intelligent decision-making across diverse fields, such as financial forecasting and preventive healthcare. However, directly prompting LLMs for probability estimation faces significant challenges: their outputs are often noisy, and the underlying predicting process is opaque. In this paper, we propose PRISM: Probability Reconstruction via Shapley Measures, a framework that brings transparency and precision to LLM-based probability estimation. PRISM decomposes an LLM's prediction by quantifying the marginal contribution of each input factor using Shapley values. These factor-level contributions are then aggregated to reconstruct a calibrated final estimate. In our experiments, we demonstrate PRISM improves predictive accuracy over direct prompting and other baselines, across multiple domains including finance, healthcare, and agriculture. Beyond performance, PRISM provides a transparent prediction pipeline: our case studies visualize how individual factors shape the final estimate, helping build trust in LLM-based decision support systems.