Has made significant contributions in the field of high-performance computing, including but not limited to: research on stall-free tensor offloading from GPU memory to CPU memory; work on training large machine learning models using heterogeneous memory has been integrated into Microsoft DeepSpeed; debugging tool for persistent memory programs won the distinguished artifact award at ASPLOS'21 and was integrated into Intel PMDK; work on accelerating power grid simulation using machine learning was highlighted by the U.S. Department of Energy; MPI fault tolerance benchmark suite and understanding natural error resilience in HPC applications were reported by HPCwire.
Awards: Amazon Research Award (2025), 2nd place in AWS Programming Contest (2025, ASPLOS’25/EuroSys’25), Virginia Tech CS Early Career Alumni Award (2023), Oracle Research Award (2022), ASPLOS Distinguished Artifact Award (2021), Facebook Faculty Research Award (2021), Western Digital Award (2021), Berkeley Lab University Faculty Fellowship (2016), NSF CAREER Award (2016), NVIDIA GPU Research Center (2016), SC best poster nomination (2016), SC best student paper nomination (2014), Oak Ridge National Lab (CSMD) Distinguished Contributor Award (2013).
Research Experience
Was a research scientist at Oak Ridge National Laboratory (ORNL) from 2011-2014. Currently an associate editor for IEEE Transactions on Parallel and Distributed Systems (TPDS). Former director of the NVIDIA GPU Research Center at Merced. Currently the planning director of the NSF IUCRC Center for Memory System Research (CEMSYS).
Education
Earned a PhD in Computer Science from Virginia Tech.
Background
Associate Professor at the University of California, Merced in Computer Science and Engineering. Director of the Parallel Architecture, System, and Algorithm Lab (PASA) and co-director of the High Performance Computing Systems and Architecture Group at UC Merced. Co-founder and Chief Scientist of Yotta Labs Inc. His research focuses on high performance computing (HPC), with a strong relevance to computer systems, especially for large-scale AI/ML.
Miscellany
Involved in the open-source project Bloombee, which focuses on de-centralized AI inference and fine-tuning.