CONDESION-BENCH: Conditional Decision-Making of Large Language Models in Compositional Action Space

📅 2026-04-10
📈 Citations: 0
Influential: 0
📄 PDF

career value

179K/year
🤖 AI Summary
Existing benchmarks for evaluating large language models (LLMs) in decision-making tasks often overlook the combinatorial structure of actions and explicit feasibility constraints, limiting their ability to capture the complexity of real-world decision scenarios. This work proposes the first conditional decision evaluation benchmark tailored to combinatorial action spaces, modeling actions as assignments to decision variables and incorporating explicit constraints at the variable, context, and assignment levels. By leveraging structured action representations and an oracle-based automated evaluation mechanism, the benchmark uniquely integrates combinatorial action spaces with multi-level constraints within decision assessment. This approach overcomes the limitations of conventional methods that rely on restricted candidate action sets and unconditional assumptions, enabling a more rigorous and realistic evaluation of LLMs’ decision-making capabilities under complex, constrained environments.

Technology Category

Application Category

📝 Abstract
Large language models have been widely explored as decision-support tools in high-stakes domains due to their contextual understanding and reasoning capabilities. However, existing decision-making benchmarks rely on two simplifying assumptions: actions are selected from a finite set of pre-defined candidates, and explicit conditions restricting action feasibility are not incorporated into the decision-making process. These assumptions fail to capture the compositional structure of real-world actions and the explicit conditions that constrain their validity. To address these limitations, we introduce CONDESION-BENCH, a benchmark designed to evaluate conditional decision-making in compositional action space. In CONDESION-BENCH, actions are defined as allocations to decision variables and are restricted by explicit conditions at the variable, contextual, and allocation levels. By employing oracle-based evaluation of both decision quality and condition adherence, we provide a more rigorous assessment of LLMs as decision-support tools.
Problem

Research questions and friction points this paper is trying to address.

conditional decision-making
compositional action space
large language models
decision-making benchmark
action feasibility
Innovation

Methods, ideas, or system contributions that make the work stand out.

conditional decision-making
compositional action space
large language models
constraint adherence
decision benchmark
🔎 Similar Papers
2024-06-17Conference on Empirical Methods in Natural Language ProcessingCitations: 3