Scholar

Yujun Lin

Google Scholar ID: V64dmUAAAAAJ

Research Scientist, NVIDIA

Efficient Deep Learning

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

7,996

H-index

i10-index

Publications

Co-authors

list available

Contact

GitHubOpen ↗LinkedInOpen ↗

Publications

17 items

Flash-KMeans: Fast and Memory-Efficient Exact K-Means

2026

Cited

Quant VideoGen: Auto-Regressive Long Video Generation via 2-Bit KV-Cache Quantization

2026

Cited

Jet-RL: Enabling On-Policy FP8 Reinforcement Learning with Unified Training and Rollout Precision Flow

2026

Cited

Four Over Six: More Accurate NVFP4 Quantization with Adaptive Block Scaling

2025

Cited

VLASH: Real-Time VLAs via Future-State-Aware Asynchronous Inference

2025

Cited

Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter

2025

Cited

QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs

2025

Cited

DC-Gen: Post-Training Diffusion Acceleration with Deeply Compressed Latent Space

2025

Cited

Resume (English only)

Academic Achievements

Published multiple papers in top conferences such as ICML, ICLR, HPCA, including but not limited to:
- SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion Transformer
- Sparse Video-Gen: Accelerating Video Diffusion Transformers with Spatial-Temporal Sparsity
- QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
- LServe: Efficient Long-Sequence LLM Serving with Unified Sparse Attention
- SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
- SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers
- LEGO: Spatial Accelerator Generation and Optimization for Tensor Applications
- VideoTime³: A 40-uJ/frame 38 FPS Video Understanding Accelerator With Real-Time DiffFrame Temporal Redundancy Reduction and Temporal Modeling
- TorchSparse: Efficient Point Cloud Inference Engine
- QuantumNAS: Noise-Adaptive Search for Robust Quantum Circuits
- PointAcc: Efficient Point Cloud Accelerator
- NAAS: Neural Accelerator Architecture Search
Selected as DAC Young Fellow 2021 and got into the 2021 Qualcomm Innovation Fellowship winners list.

Research Experience

Currently working as a research scientist at NVIDIA.

Education

PhD from MIT EECS HAN Lab, advised by Prof. Song Han; B.Eng. in Electronic Engineering from Tsinghua University.

Background

Currently a research scientist at NVIDIA. Research area is efficient deep learning, with a special focus on the co-design of algorithm, system, and hardware for foundation models (diffusion models, LLMs, etc).

Co-authors

8 total