- Marconi: Prefix Caching for the Era of Hybrid LLMs
- FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision
- Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
- Mamba: Linear-Time Sequence Modeling with Selective State Spaces
- FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
- Monarch: Expressive Structured Matrices for Efficient and Accurate Training
Awards:
- Best Paper award at the ICML Hardware Aware Efficient Training Workshop 2022
- Inaugural Stanford Open Source Software Prize 2024
Research Experience
Current PhD Students: Ted Zadouri, Berlin Chen, Wentao Guo, Xinle Cheng (co-advised with Ravi Netravali), Lijie Yang (co-advised with Ravi Netravali), Liane Ganti (co-advised with Elad Hazan).
Education
PhD, Department of Computer Science, Stanford University
Background
Research Interests: Machine learning and systems, with a focus on efficient training and inference, hardware-aware algorithms, sequence models with long-range memory. Bio: Assistant Professor of Computer Science at Princeton University, Co-founder & Chief Scientist of Together AI.