2024: 'Neural differential equations for temperature control in buildings under demand response programs' published in Applied Energy
2024: 'Do Transformer World Models Give Better Policy Gradients?' presented at ICML
2024: 'Maximum entropy GFlowNets with soft Q-learning' presented at AISTATS
2024: Multiple papers at ICLR including 'Decoupling regularization from the action space', 'Bridging State and History Representations', 'Course Correcting Koopman Representations', and 'Motif: Intrinsic Motivation from Artificial Intelligence Feedback'
2023: Oral presentation at NeurIPS – 'When Do Transformers Shine in RL? Decoupling Memory from Credit Assignment'
2023: Poster presentations at NeurIPS – 'Block-State Transformers' and 'Policy Optimization in a Noisy Neighborhood'
2023: Spotlight paper at NeurIPS – 'Double Gumbel Q-Learning'
2023: ICLR notable top 5% paper – 'Sample-Efficient Reinforcement Learning by Breaking the Replay Ratio Barrier'
2022: NeurIPS Datasets and Benchmarks paper – 'Myriad: a real-world testbed to bridge trajectory optimization and deep learning'
2022: ICML and RLDM papers – 'The Primacy Bias in Deep Reinforcement Learning' and 'Direct Behavior Specification via Constrained Reinforcement Learning'
2022: ICLR paper – 'Continuous-Time Meta-Learning with Forward Mode Differentiation'
2021: NeurIPS workshop papers – 'Meta Dynamic Programming' and 'Long-Term Credit Assignment via Model-based Temporal Shortcuts'
Background
Associate Professor at Université de Montréal's DIRO
CIFAR AI Chair
Core member of Mila
Affiliated with the Institute for Data Valorization (IVADO)
Research at the intersection of theory and application in reinforcement learning
Focuses on real-world problems in HVAC systems and molecular modeling
Works on improving RL through representation learning, neural differential equations, and transformer-based models
Particularly interested in tackling the curse of horizon in long-term planning
Recently exploring the use of large language models to address specification challenges in RL for better alignment and sample efficiency