Scholar

Pierre-Luc Bacon

Google Scholar ID: 9H77FYYAAAAJ

University of Montreal

reinforcement learningartificial intelligence

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

3,174

H-index

i10-index

Publications

Co-authors

list available

Contact

No contact links provided.

Publications

16 items

Rotation-Preserving Supervised Fine-Tuning

2026

Cited

Layerwise LQR for Geometry-Aware Optimization of Deep Networks

2026

Cited

Towards Practical World Model-based Reinforcement Learning for Vision-Language-Action Models

2026

Cited

What Makes Value Learning Efficient in Residual Reinforcement Learning?

2026

Cited

Reward Redistribution for CVaR MDPs using a Bellman Operator on L-infinity

2026

Cited

Long-Horizon Model-Based Offline Reinforcement Learning Without Conservatism

2025

Cited

The Three Regimes of Offline-to-Online Reinforcement Learning

2025

Cited

Planning with Unified Multimodal Models

2025

Cited

Resume (English only)

Academic Achievements

2024: 'Neural differential equations for temperature control in buildings under demand response programs' published in Applied Energy
2024: 'Do Transformer World Models Give Better Policy Gradients?' presented at ICML
2024: 'Maximum entropy GFlowNets with soft Q-learning' presented at AISTATS
2024: Multiple papers at ICLR including 'Decoupling regularization from the action space', 'Bridging State and History Representations', 'Course Correcting Koopman Representations', and 'Motif: Intrinsic Motivation from Artificial Intelligence Feedback'
2023: Oral presentation at NeurIPS – 'When Do Transformers Shine in RL? Decoupling Memory from Credit Assignment'
2023: Poster presentations at NeurIPS – 'Block-State Transformers' and 'Policy Optimization in a Noisy Neighborhood'
2023: Spotlight paper at NeurIPS – 'Double Gumbel Q-Learning'
2023: ICLR notable top 5% paper – 'Sample-Efficient Reinforcement Learning by Breaking the Replay Ratio Barrier'
2022: NeurIPS Datasets and Benchmarks paper – 'Myriad: a real-world testbed to bridge trajectory optimization and deep learning'
2022: ICML and RLDM papers – 'The Primacy Bias in Deep Reinforcement Learning' and 'Direct Behavior Specification via Constrained Reinforcement Learning'
2022: ICLR paper – 'Continuous-Time Meta-Learning with Forward Mode Differentiation'
2021: NeurIPS workshop papers – 'Meta Dynamic Programming' and 'Long-Term Credit Assignment via Model-based Temporal Shortcuts'

Background

Associate Professor at Université de Montréal's DIRO
CIFAR AI Chair
Core member of Mila
Affiliated with the Institute for Data Valorization (IVADO)
Research at the intersection of theory and application in reinforcement learning
Focuses on real-world problems in HVAC systems and molecular modeling
Works on improving RL through representation learning, neural differential equations, and transformer-based models
Particularly interested in tackling the curse of horizon in long-term planning
Recently exploring the use of large language models to address specification challenges in RL for better alignment and sample efficiency

Co-authors

16 total