DexHoldem: Playing Texas Hold'em with Dexterous Embodied System

๐Ÿ“… 2026-05-18
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

215K/year
๐Ÿค– AI Summary
Current embodied intelligence systems lack a unified benchmark for comprehensively evaluating perception, decision-making, and dexterous manipulation in dynamic tabletop environments. This work proposes DexHoldemโ€”the first embodied evaluation framework that integrates the complex rule structure of Texas Holdโ€™em poker with a high-degree-of-freedom dexterous hand (ShadowHand). The framework establishes a closed-loop assessment protocol through 14 manipulation primitives and 1,470 teleoperated demonstrations, encompassing embodied perception, policy execution, and state recovery. Experiments show that a multi-strategy controller (ฯ€โ‚€.โ‚…) achieves a task completion rate of 61.2%, while Opus 4.7 and GPT 5.5 exhibit superior performance in question-level and field-level perception, respectively. The results further reveal a significant gap between sub-module capabilities and overall state recovery proficiency.
๐Ÿ“ Abstract
Evaluating embodied systems on real dexterous hardware requires more than isolated primitive skills: an agent must perceive a changing tabletop scene, choose a context-appropriate action, execute it with a dexterous hand, and leave the scene usable for later decisions. We introduce DexHoldem, a real-world system-level benchmark built around Texas Hold'em dexterous manipulation with a ShadowHand. DexHoldem provides 1,470 teleoperated demonstrations across 14 Texas Hold'em manipulation primitives, a standardized physical policy benchmark, and an agentic perception benchmark that tests whether agents can recover the structured game state needed for embodied decision making. On primitive execution, $ฯ€_{0.5}$ obtains the highest task completion rate ($61.2\%$), while $ฯ€_{0.5}$ and $ฯ€_0$ tie on scene-preserving success rate ($47.5\%$). On agentic perception, Opus 4.7 obtains the best strict problem-level accuracy ($34.3\%$), while GPT 5.5 obtains the best average field-wise accuracy ($66.8\%$), exposing a gap between isolated visual sub-capabilities and complete routing-relevant state recovery. Finally, we instantiate the full embodied-agent loop in three case studies, where waiting, recovery dispatches, human-help requests, and repeated primitive execution reveal how perception and policy errors accumulate during closed-loop deployment. DexHoldem therefore evaluates dexterous tabletop execution, agentic perception, and embodied decision routing in a shared physical setting. Project page: https://dexholdem.github.io/Dexholdem/.
Problem

Research questions and friction points this paper is trying to address.

embodied intelligence
dexterous manipulation
Texas Hold'em
agentic perception
tabletop interaction
Innovation

Methods, ideas, or system contributions that make the work stand out.

dexterous manipulation
embodied intelligence
agentic perception
system-level benchmark
Texas Hold'em
๐Ÿ”Ž Similar Papers
No similar papers found.