- UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction (ICML 2025).
- BigDocs: An Open Dataset for Training Multimodal Models on Document and Code Tasks (ICLR 2025).
- GenRL: Multimodal Foundation World Models for Generalist Embodied Agents (NeurIPS 2024).
- Rendering-Aware Reinforcement Learning for Vector Graphics Generation (NeurIPS 2025).
- StarVector: Generating Scalable Vector Graphics Code From Images And Text (CVPR 2025).
Research Experience
Currently a Staff Research Scientist at ServiceNow, and an Adjunct Professor and core industry member at Mila Montréal. During his Ph.D., he interned as a Research Scientist at Google DeepMind.
Education
Ph.D. from MILA, University of Montreal, supervised by Prof. Aaron Courville; Master's in Computer Science from IIT Delhi, recipient of Prof. A.K.Sinha best student award.
Background
Research interests span broadly over generative models and reinforcement learning, with a recent focus on multimodal perception and world representations, which are key for generalist AI systems that integrate perception and action while incorporating feedback from the environment.
Miscellany
If you are interested in his work and would like to explore fundamental research questions around these, collaborate or receive mentorship, feel free to reach out.