* Recursive Self-Aggregation Unlocks Deep Thinking in Large Language Models (Preprint)
* Trajectory Balance with Asynchrony: Decoupling exploration and learning for fast, scalable, LLM post-training (NeurIPS 2025)
* Outsourced diffusion sampling: Efficient posterior inference in latent spaces of generative models (ICML 2025)
* Amortizing intractable inference in diffusion models for vision, language and control (NeurIPS 2024)
* Reasoning with Latent Diffusion in Offline Reinforcement Learning (ICLR 2024)
* Learning Temporally Abstract World Models without Online Experimentation (ICML 2023)
* Multi-Alpha Soft Actor-Critic: Overcoming Stochastic Biases in Maximum Entropy Reinforcement Learning (ICRA 2023)
* MLNav: Learning to Safely Navigate on Martian Terrains (RAL+ICRA 2022)
* Machine Learning Based Path Planning for Improved Rover Navigation (IEEE Aerospace Conference 2021)
Research Experience
- PhD Student at Mila, Quebec AI Institute
- Academic Collaborator at LawZero - Safe AI for Humanity
- Intern and Academic Collaborator at Lawrence Livermore National Laboratory (LLNL)
- Intern at Valence Labs, working on training flow bridges for molecular systems
- Intern at NASA Jet Propulsion Laboratory (JPL), working on more efficient Mars Rover motion planning
Education
- PhD: Mila, Quebec AI Institute, Université de Montréal, Supervisors: Glen Berseth, Nikolay Malkin
- Master's: Robotics, Carnegie Mellon University, Advisor: Dr. Jeff Schneider
- Bachelor's: Computer Science, Manipal Institute of Technology
Background
- Research Interests: reinforcement learning, reasoning, and probabilistic inference
- Professional Field: Artificial Intelligence, Machine Learning
- Brief Introduction: Currently a PhD student at Mila, Quebec AI Institute, affiliated with Université de Montréal, co-supervised by Glen Berseth and Nikolay Malkin. Closely works with Yoshua Bengio and is an academic collaborator at LawZero - Safe AI for Humanity and Lawrence Livermore National Laboratory (LLNL).
Miscellany
- Personal Interests: Solving fundamental issues with LLMs such as the long context problem