Proposed GEC, a unified framework for interactive decision making, revealing a representation complexity hierarchy among RL paradigms and proving quantum computing enables quadratic speedup in online exploration (Mathematics of Operations Research, 2025+)
Re-examined model-based, policy-based, and value-based RL through the lens of representation complexity (NeurIPS 2024)
Developed a provably efficient quantum RL algorithm with logarithmic worst-case regret (ICML 2024)
Provided the first theoretical analysis of RLHF with function approximation and initiated studies on KL-constrained RLHF, leading to iterative learning and Reinforced Token Optimization (RTO) (ICML 2025, ICML 2024)
Designed the first line of efficient equilibrium-finding algorithms for offline and Stackelberg Markov games (JMLR 2023)
Proposed the first provably efficient preference-based RL algorithm with general function approximation (ICML 2022)