Scholar

Rémi Munos

Google Scholar ID: OvKEnVwAAAAJ

FAIR, Meta

deepRLRLHFMCTSbandit theorystatistical learning

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

39,897

H-index

i10-index

175

Publications

Co-authors

list available

Contact

No contact links provided.

Publications

18 items

Distilling LLM Feedback for Lean Theorem Proving

2026

Cited

Spectral bandits for smooth graph functions with applications in recommender systems

2026

Cited

Bandits attack function optimization

2026

Cited

Black-box optimization of noisy functions with unknown smoothness

2026

Cited

Spectral bandits

2026

Cited

Stochastic simultaneous optimistic optimization

2026

Cited

Efficient learning by implicit exploration in bandit problems with side observations

2026

Cited

Planning in entropy-regularized Markov decision processes and games

2026

Cited

Resume (English only)

Academic Achievements

Published a monograph 'From bandits to Monte-Carlo Tree Search: The optimistic principle applied to optimization and planning' (2014); supervised several PhD students, including Sébastien Bubeck who is now an assistant professor at Princeton, and Odalric-Ambrym Maillard who received the AFIA 2012 best thesis award; organized and chaired several academic conferences such as AAAI 2013 tutorial, ALT 2013 co-chair, etc.

Research Experience

Worked at Microsoft Research New-England from 2013 to 2014; worked with the SequeL team at INRIA Lille - Nord Europe; participated in several European projects and activities such as PASCAL2, COMPLACS, CO-ADAPT, etc.

Background

Currently at Google DeepMind, Senior Researcher. Research interests include: bandit theory, optimistic algorithms (such as KL-UCB, UCB-V), Thompson sampling, foundations of Monte-Carlo Tree Search, optimal control, reinforcement learning, etc.

Miscellany