Scholar

Yang Sui

Google Scholar ID: Q2W1p6sAAAAJ

Postdoc, Rice University

Efficient AIGenerative AIDiffusion ModelsLarge Language ModelsMultimodal LLMs

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

941

H-index

i10-index

Publications

Co-authors

list available

Contact

Emailyangsui.research@gmail.com TwitterOpen ↗GitHubOpen ↗LinkedInOpen ↗

Publications

21 items

TIDE: Efficient and Lossless MoE Diffusion LLM Inference with I/O-aware Expert Offload

2026

Cited

Accelerating 3D Gaussian Splatting using Tensor Cores

2026

Cited

Temporal Aware Pruning for Efficient Diffusion-based Video Generation

2026

Cited

Prediction-Powered Conditional Inference

2026

Cited

ATA: Bridging Implicit Reasoning with Attention-Guided and Action-Guided Inference for Vision-Language Action Models

2026

Cited

Generalization Bounds of Stochastic Gradient Descent in Homogeneous Neural Networks

2026

Cited

EcoSpa: Efficient Transformer Training with Coupled Sparsity

2025

Cited

LowDiff: Efficient Diffusion Sampling with Low-Resolution Condition

2025

Cited

Resume (English only)

Academic Achievements

Publications:
- September 2025: Two papers accepted by NeurIPS 2025.
- August 2025: One paper accepted by TMLR 2025.
- May 2025: One paper accepted by TMLR 2025.
- April 2025: DFloat11: Lossless Compression for LLM reported by 新智元 and 机器之心.
- April 2025: Stop Overthinking survey reported by 新智元.
- March 2025: Released survey: Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models.
- February 2025: Three papers accepted by CVPR 2025, one of which, TopV, marks my first experience as an advisor.
- September 2024: One paper accepted by The Thirty-eighth Annual Conference on Neural Information Processing Systems.

Research Experience

Spring 2024: Research Intern at Snap Research Creative Vision team, proposed 1.99 bits quantization on text-to-image generative model, BitsFusion.
2022: Research Intern at Tencent America Media Lab, exploring efficiency and robustness of Learned Image Compression and Transformer models.
2019: Full-time Algorithm Engineer at JD, working on face verification and recognition.
2018: R&D Intern and member of PaddlePaddle at Baidu, initialized the Paddle-Lite deep learning inference framework.

Education

Ph.D.: Rutgers University, Advisor: Prof. Bo Yuan.

Background

Research Interests: Efficient AI and Trustworthy AI. In the domain of Efficient AI, focuses on developing resource-efficient deep learning models without compromising accuracy or performance. In Trustworthy AI, investigates model vulnerability and robustness through adversarial and backdoor attacks.

Miscellany