Scholar

Tianyi Zhang

Google Scholar ID: ekRl428AAAAJ

Rice University

Large Language ModelsEfficient InferenceModel Compression

Citations & Impact

All-time

Citations

278

H-index

i10-index

Publications

Co-authors

Contact

Publications

1 items

2026

Cited

Resume (English only)

Academic Achievements

Published several papers in top-tier conferences such as NeurIPS, ICML, ICLR, and EMNLP. Key contributions include: a lossless compression technique reducing model size by 30% while maintaining identical outputs and enabling efficient GPU inference; introducing fine-tunable sketches for efficient LLM adaptation; presenting LeanQuant for accurate and scalable LLM quantization; and developing an efficient LLM inference method using only 1 bit per channel for KV cache.

Research Experience

During his PhD at Rice University, he focused on LLMs' compression and optimization techniques, developing a lossless compression technique that reduces model size by 30% while preserving bit-for-bit identical outputs and enabling efficient GPU inference.

Education

2021 - 2025: PhD in Computer Science at Rice University, advised by Prof. Anshumali Shrivastava; 2016 - 2021: B.S. in Computer Science from the University of Waterloo.

Background

PhD candidate in Computer Science, with research interests in lossless and lossy model compression, inference optimizations, accurate and efficient fine-tuning, GPU kernel design and optimization, and quantization. Aims to make large language models (LLMs) and foundation models more efficient, accurate, and accessible.

Miscellany

Also goes by Tony, has made open-source contributions, and his work reached #1 on Hacker News, with models on Hugging Face receiving thousands of monthly downloads.

Co-authors

0 total

Co-authors: 0 (list not available)