Scholar

Ruokai Yin

Google Scholar ID: MF1nKn4AAAAJ

Yale University

Computer ArchitectureDomain-specific AccelerationDeep LearningNeuromorphic Computing

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

408

H-index

i10-index

Publications

Co-authors

list available

Contact

CVOpen ↗GitHubOpen ↗LinkedInOpen ↗

Publications

8 items

Optimal Brain Decomposition for Accurate LLM Low-Rank Approximation

2026

Cited

MD-SNN: Membrane Potential-aware Distillation on Quantized Spiking Neural Network

2025

Cited

Learning to Shard: RL for Co-optimizing the Parallelism Degrees and Per-operator Sharding Dimensions in Distributed LLM Inference

2025

Cited

DiffAxE: Diffusion-driven Hardware Accelerator Generation and Design Space Exploration

2025

Cited

DuoGPT: Training-free Dual Sparsity through Activation-aware Pruning in LLMs

2025

Cited

Memba: Membrane-driven Parameter-Efficient Fine-Tuning for Mamba

2025

Cited

GPTQv2: Efficient Finetuning-Free Quantization for Asymmetric Calibration

2025

Cited

PacQ: A SIMT Microarchitecture for Efficient Dataflow in Hyper-asymmetric GEMMs

2025

Cited

Resume (English only)

Academic Achievements

Several papers have been accepted at top conferences such as NeurIPS, DAC, and MICRO, and nominated for best paper at ASP-DAC 2024. Specific papers include: DuoGPT: Training-free Dual Sparsity through Activation-aware Pruning in LLMs; Learning to Shard: RL for Co-optimizing the Parallelism Degrees and Per-operator Sharding Dimensions in Distributed LLM Inference; PacQ: A SIMT Microarchitecture for Efficient Dataflow in Hyper-asymmetric GEMMs; LoAS: Fully Temporal-Parallel Dataflow for Dual-Sparse Spiking Neural Networks; MINT: Multiplier-less INTeger Quantization for Energy Efficient Spiking Neural Networks; Workload-Balanced Pruning for Sparse Spiking Neural Networks; Wearable-based Human Activity Recognition with Spatio-Temporal Spiking Neural Networks; SATA: Sparsity-Aware Training Accelerator for Spiking Neural Networks.

Research Experience

He has interned at Microsoft Azure, working with the AI System Architecture team, and at Cerebras Systems, working with the ASIC team.

Education

He is currently a final year Ph.D. student in the Department of Electrical Engineering at Yale University, advised by Prof. Priyadarshini Panda. Prior to joining Yale, he earned his B.S. from the University of Wisconsin-Madison, majoring in Electrical Engineering, Computer Science, and Mathematics. During his undergraduate, he worked with Prof. Joshua San Miguel on designing computer architectures for stochastic computing.

Background

His research focuses on designing energy-efficient computer architectures, systems, and algorithms for AI workloads, particularly those involving asymmetric operand precision or sparsity. He is also interested in neuromorphic computing, as enablers for bio-plausible and energy-efficient deep learning (spiking neural networks).

Co-authors

8 total