Scholar

Pengfei Zheng

Google Scholar ID: 3QxdztoAAAAJ

Huawei Technologies

Machine Learning SystemSystem-Algorithm Co-DesignDistributed SystemData+AI

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

241

H-index

i10-index

Publications

Co-authors

list available

Contact

GitHubOpen ↗LinkedInOpen ↗

Publications

8 items

Ascend-RaBitQ: Heterogeneous NPU-CPU Acceleration of Billion-Scale Similarity Search with 1-bit Quantization

2026

Cited

Centrum: Model-based Database Auto-tuning with Minimal Distributional Assumptions

2025

Cited

CLEANet: Robust and Efficient Anomaly Detection in Contaminated Multivariate Time Series

2025

Cited

OneRec-Think: In-Text Reasoning for Generative Recommendation

2025

Cited

OneRec-V2 Technical Report

2025

Cited

OmniGen2: Exploration to Advanced Multimodal Generation

2025

Cited

Co-VisiON: Co-Visibility ReasONing on Sparse Image Sets of Indoor Scenes

2025

Cited

Speculative MoE: Communication Efficient Parallel MoE Inference with Speculative Token and Expert Pre-scheduling

2025

Cited

Resume (English only)

Academic Achievements

arXiv preprint: 'Speculative MoE: Communication Efficient Parallel MoE Inference with Speculative Token and Expert Pre-scheduling'
'CoffeeBoost: Gradient Boosting Native Conformal Inference for Bayesian Optimization' accepted to AAAI 2025
'Centrum: Model-based Database Auto-tuning with Minimal Distributional Assumptions' accepted to SIGMOD 2025
'Mirage: MOE + Decision Transformer for non-interruption, non-overlap resource provision in ML training' published at SC 2023
'Shockwave: Fair and Efficient Scheduling for Dynamic Adaptation in Machine Learning' published at NSDI 2023
Mentored students including Lynn Liu (HyperAPX, now Ph.D. at UC Berkeley), Rui Pan (Shockwave, now Ph.D. at Princeton), and Calvin Ma (Hound)

Background

Full-time Staff Researcher at Huawei Technologies
Research areas include autonomous system agents, distributed systems, and disaggregated datacenter architecture
Designs statistical and neural learning methods to model dynamics and uncertainty in large-scale distributed computer systems
Develops algorithmic decision-making mechanisms (e.g., convex/nonlinear optimization, Bayesian optimization, contextual bandits, reinforcement learning) to optimize system performance, efficiency, and scalability
Recent work includes black-box optimization for autonomous system tuning, real-time recommendation algorithms for preconditioner/solver selection in linear systems, dynamic market theory, and stochastic policy learning for scheduling and resource allocation
Works on high-performance ML training and inference, building multi-dimensional hybrid parallelism solvers (data, tensor, pipeline, sequence, expert parallelism, offload, rematerialization, pipeline interleaving) to optimize token throughput, MFU, and scale-out linearity for hyper-scale LLM training on massive GPU/NPU clusters
Researches dynamic MoE layers and real-time model/input pruning techniques to accelerate LLM inference

Co-authors

12 total