Adaptive Computation Pruning for the Forgetting Transformer, COLM 2025 (First author).
Forgetting Transformer: Softmax Attention with a Forget Gate, ICLR 2025 (First author).
The Curse of Diversity in Ensemble-Based Exploration, ICLR 2024 (First author).
Improving Generative Imagination in Object-Centric World Models, ICML 2020 (First author).
SPACE: Unsupervised Object-Oriented Scene Representation via Spatial Attention and Decomposition, ICLR 2020 (Co-first author, marked with *).
GIFT: Learning Transformation Invariant Dense Visual Descriptors via Group CNNs, NeurIPS 2019 (Collaborator).
Background
Third-year Ph.D. student at Mila and the University of Montreal, advised by Professor Aaron Courville.
Research goal is to understand and build general intelligence; currently focused on long-context sequence models (especially linear-complexity models) and their applications in reinforcement learning (RL).
Views an agent fundamentally as a sequence model, emphasizing the temporal and sequential nature of agent-environment interaction (memory, experience stream, learning, credit assignment, etc.) as central to intelligence.
Believes RL is likely necessary for superhuman intelligence, but simple future prediction—not just reward-based learning—may remain essential due to limited learning signals.
Intrigued by the inefficiency of human thought and reasoning in natural language, and questions whether neural networks can develop mathematical or high-dimensional geometric intuition beyond human capabilities.