Completed a set of notes on bandit convex optimisation.
The 'Convex Bandit' paper was accepted at the journal 'Mathematical Statistics and Learning'.
Research results with Andras Gyorgy showed that the stability of mirror descent for some exploration distribution and estimators is upper bounded by the information ratio.
Two partial monitoring papers were accepted at COLT.
A preprint on bandit convex optimisation improved the dimension-dependence in the information-theoretic rate from d^9.5 to d^3 d^(2.5).
Co-authored a paper with Shuai Li and Csaba on learning to rank in a linear model, accepted to ICML.
Followup on the COLT paper on partial monitoring with Csaba, proposing a simple and efficient algorithm that matches the non-constructive upper bound.
Research Experience
Research scientist at DeepMind, based in London.
Completed a new book on bandits with Csaba, published by Cambridge University Press.
Collaborated with Andras Gyorgy on connections between information-theoretic arguments and adversarial bandits based on mirror descent or follow the regularised leader.
Co-authored a paper with Julian Zimmert showing that the information ratio is upper bounded by stability in certain linear settings.
Worked with Johannes Kirschner and Andreas Krause on a linear version of partial monitoring.
Co-authored a paper with Botao Hao on the asymptotics of linear contextual bandits.
Collaborated with Csaba and Gellert on misspecified linear models in bandit and RL settings.
Developed a new algorithm for tree search.
Contributed to the bsuite RL proving ground.
Conducted research on model selection for contextual bandits.
Background
Research scientist, mainly working on algorithms for sequential decision making. Spent a lot of time in recent years on bandits.
Miscellany
Personal interests include drawing posters using GoodNotes and Graphic software.