Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF (ICLR, 2025)
Small steps no more: Global convergence of stochastic gradient bandits for arbitrary learning rates (NeurIPS, 2024)
Target Networks and Over-parameterization Stabilize Off-policy Bootstrapping with Function Approximation (ICML, 2024, Spotlight)
Ordering-based Conditions for Global Convergence of Policy Gradient Methods (NeurIPS, 2023, Oral Presentation)
Stochastic Gradient Succeeds for Bandits (ICML, 2023)
Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice (ICML, 2023)
The Role of Baselines in Policy Gradient Optimization (NeurIPS, 2022)
On the Global Convergence Rates of Decentralized Softmax Gradient Play in Markov Potential Games (NeurIPS, 2022)
KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal (partial information)
Research Experience
Interned at Google Brain (Aug. 2019 - Aug. 2021) and Borealis AI (Sep. 2018 - Jan. 2019 and Mar. 2019 - May 2019).
Education
Ph.D. in Statistical Machine Learning from the University of Alberta (Sep. 2015 - Sep. 2021), supervised by Dale Schuurmans; M.S. from Shanghai Jiao Tong University (Mar. 2015), supervised by Bao-Liang Lu; B.E. from South China University of Technology (June 2012).
Background
Research interests include machine learning, reinforcement learning, and optimization. Currently a senior research scientist at Google DeepMind (previously at Google Brain Canada until April 20, 2023).