2025. 'Personalization of Large Language Models: A Survey.' Transactions on Machine Learning Research.
2025. 'GUI Agents: A Survey.' In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL).
2025. 'From Selection to Generation: A Survey of LLM-based Active Learning.' ACL 2025.
2025. 'Adaptive Submodular Policy Optimization.' In Proceedings of the 2nd Reinforcement Learning Conference (RLC).
2025. 'FisherSFT: Data-Efficient Supervised Fine-Tuning of Language Models Using Information Gain.' In Proceedings of the 42nd International Conference on Machine Learning (ICML).
Research Experience
2024–present: Principal Research Scientist at Adobe Research.
2021–2024: Research Scientist at Amazon.
2018–2021: Research Scientist at Google Research.
2014–2018: Research Scientist at Adobe Research.
2011–2014: Research Scientist at Technicolor’s Research Center.
2006–2011: Research Scientist at Intel Research.
Background
Proposes, analyzes, and applies algorithms that learn incrementally, run in real time, and converge to near-optimal solutions as observations increase.
Recent work focuses on applying these ideas to modern generative models and human feedback.
Studies seamless human-machine interaction—the holy grail of AI—traditionally approached via reinforcement learning and bandit frameworks.
Made fundamental contributions to the bandit field, especially in structured problems involving graphs, submodularity, semi-bandit feedback, and low-rank matrices.
Developed online learning-to-rank bandit algorithms capable of handling exponentially large action spaces and partial feedback; these are simple, theoretically sound, robust, and state-of-the-art.
Recent efforts enhance practicality of bandit algorithms via randomization-based exploration (compatible with neural networks) and reducing statistical complexity through meta-, multi-task, and federated learning.
Explores persistent challenges of exploration and statistically-efficient adaptivity in the era of pre-trained models, such as optimal experimental design for efficient LLM fine-tuning and off-policy evaluation using logged human feedback.