Co-first author of 'SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training' at ICML 2025.
First author of 'Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning' at NeurIPS 2024.
Co-author of 'Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs', presented orally at CVPR 2024.
First author of 'Investigating the Catastrophic Forgetting in Multimodal Large Language Model Fine-Tuning' at CPAL 2024.
Co-first author of 'Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning' at NeurIPS 2023.
Co-first author of 'Understanding the Complexity Gains of Single-Task RL with a Curriculum' at ICML 2023.
First author of 'Computational Benefits of Intermediate Rewards for Goal-Reaching Policy Learning' published in JAIR 2022.
Contributed to the Gemini 2.5 project on advanced reasoning, multimodality, long context, and next-generation agentic capabilities.
Background
Currently a Member of Technical Staff at xAI.
Research interests span machine learning, reinforcement learning, and large models.
Curious about things not yet understood, such as how the universe works and what his cats are thinking.
Recently developed an interest in critiquing 'fake papers'—those that appear fancy but lack real substance.
Currently focused on near-future (≤5 years) practical directions like data cleaning, evaluation, post-training, and agents, especially in large multimodal models.