Scholar

Jincheng Mei

Google Scholar ID: g2uX6_gAAAAJ

Research Scientist, Google DeepMind

Machine LearningReinforcement LearningOptimization

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

1,023

H-index

i10-index

Publications

Co-authors

list available

Contact

No contact links provided.

Publications

8 items

Delightful Gradients Accelerate Corner Escape

2026

Cited

Revisiting Mixture Policies in Entropy-Regularized Actor-Critic

2026

Cited

Rethinking the Global Convergence of Softmax Policy Gradient with Linear Function Approximation

2025

Cited

Ordering-based Conditions for Global Convergence of Policy Gradient Methods

Neural Information Processing Systems · 2025

Cited

Small steps no more: Global convergence of stochastic gradient bandits for arbitrary learning rates

Neural Information Processing Systems · 2025

Cited

Faster WIND: Accelerating Iterative Best-of-N Distillation for LLM Alignment

arXiv.org · 2024

Cited

Target Networks and Over-parameterization Stabilize Off-policy Bootstrapping with Function Approximation

International Conference on Machine Learning · 2024

Cited

Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF

arXiv.org · 2024

Cited

Resume (English only)

Academic Achievements

REINFORCE Converges to Optimal Policies with Any Learning Rate (NeurIPS, 2025)
Faster WIND: Accelerating Iterative Best-of-N Distillation for LLM Alignment (AISTATS, 2025)
Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF (ICLR, 2025)
Small steps no more: Global convergence of stochastic gradient bandits for arbitrary learning rates (NeurIPS, 2024)
Target Networks and Over-parameterization Stabilize Off-policy Bootstrapping with Function Approximation (ICML, 2024, Spotlight)
Ordering-based Conditions for Global Convergence of Policy Gradient Methods (NeurIPS, 2023, Oral Presentation)
Stochastic Gradient Succeeds for Bandits (ICML, 2023)
Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice (ICML, 2023)
The Role of Baselines in Policy Gradient Optimization (NeurIPS, 2022)
On the Global Convergence Rates of Decentralized Softmax Gradient Play in Markov Potential Games (NeurIPS, 2022)
KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal (partial information)

Research Experience

Interned at Google Brain (Aug. 2019 - Aug. 2021) and Borealis AI (Sep. 2018 - Jan. 2019 and Mar. 2019 - May 2019).

Education

Ph.D. in Statistical Machine Learning from the University of Alberta (Sep. 2015 - Sep. 2021), supervised by Dale Schuurmans; M.S. from Shanghai Jiao Tong University (Mar. 2015), supervised by Bao-Liang Lu; B.E. from South China University of Technology (June 2012).

Background

Research interests include machine learning, reinforcement learning, and optimization. Currently a senior research scientist at Google DeepMind (previously at Google Brain Canada until April 20, 2023).

Co-authors

15 total