Published papers such as 'HAMburger: Accelerating LLM Inference via Token Smashing' and 'Speculative Prefill: Turbocharging TTFT with Lightweight and Training-Free Token Importance Estimation'; worked on projects like CodeLlama and Llama 2 Long.
Research Experience
Research Scientist Intern at Nvidia (June 2025 – December 2025), researching diffusion LLMs, model distillation, etc.; AI Resident at Meta AI (September 2022 – September 2023), working on CodeLlama, Llama 2 Long, etc.; Research Assistant at ETH Zurich (March 2022 – November 2022), working on offline reinforcement learning algorithms; Machine Learning Engineer at ByteDance (August 2020 – August 2021), involved in the development of Douyin's E-commerce platform search engine; Teaching Assistant at Courant Institute, New York University (September 2018 – May 2019), tutoring students on computer system organization.
Education
PhD Student in CS at University of Chicago (2024 - Present), advised by Prof. Ce Zhang; MS in Computer Science from ETH Zurich (2024); BA in Computer Science with Honors from New York University (2020).
Background
Research interests include large language models and NLP, AI systems, and the science of foundation models. Worked as an AI resident at Meta AI on LLMs and 3D computer vision. Graduated from NYU with honors in CS and was awarded the Prize for Outstanding Performance in CS.
Miscellany
Feel free to drop me an email for anything, especially for potential collaboration!!