Tianyi Zhang
Scholar

Tianyi Zhang

Google Scholar ID: ekRl428AAAAJ
Rice University
Large Language ModelsEfficient InferenceModel Compression
Citations & Impact
All-time
Citations
278
 
H-index
6
 
i10-index
5
 
Publications
17
 
Co-authors
0
 
Resume (English only)
Academic Achievements
  • Published several papers in top-tier conferences such as NeurIPS, ICML, ICLR, and EMNLP. Key contributions include: a lossless compression technique reducing model size by 30% while maintaining identical outputs and enabling efficient GPU inference; introducing fine-tunable sketches for efficient LLM adaptation; presenting LeanQuant for accurate and scalable LLM quantization; and developing an efficient LLM inference method using only 1 bit per channel for KV cache.
Research Experience
  • During his PhD at Rice University, he focused on LLMs' compression and optimization techniques, developing a lossless compression technique that reduces model size by 30% while preserving bit-for-bit identical outputs and enabling efficient GPU inference.
Education
  • 2021 - 2025: PhD in Computer Science at Rice University, advised by Prof. Anshumali Shrivastava; 2016 - 2021: B.S. in Computer Science from the University of Waterloo.
Background
  • PhD candidate in Computer Science, with research interests in lossless and lossy model compression, inference optimizations, accurate and efficient fine-tuning, GPU kernel design and optimization, and quantization. Aims to make large language models (LLMs) and foundation models more efficient, accurate, and accessible.
Miscellany
  • Also goes by Tony, has made open-source contributions, and his work reached #1 on Hacker News, with models on Hugging Face receiving thousands of monthly downloads.
Co-authors
0 total
Co-authors: 0 (list not available)