Scholar
Mostofa Patwary
Google Scholar ID: 0rt4tbMAAAAJ
Director, Applied Deep Learning Research, NVIDIA
Natural Language Processing
Large Scale Deep Learning
High Performance Computing
Parallel
Follow
Homepage
↗
Google Scholar
↗
Citations & Impact
All-time
Citations
10,446
H-index
35
i10-index
57
Publications
20
Co-authors
13
list available
Contact
GitHub
Open ↗
Publications
10 items
LatentMoE: Toward Optimal Accuracy per FLOP and Parameter in Mixture of Experts
2026
Cited
1
Nemotron Elastic: Towards Efficient Many-in-One Reasoning LLMs
2025
Cited
0
NVIDIA Nemotron Nano V2 VL
2025
Cited
0
Multi-Agent Evolve: LLM Self-Improve through Co-evolution
2025
Cited
0
RLP: Reinforcement as a Pretraining Objective
2025
Cited
0
Nemotron-CC-Math: A 133 Billion-Token-Scale High Quality Math Pretraining Dataset
2025
Cited
0
Fusing LLM Capabilities with Routing Data
2025
Cited
0
NEMOTRON-CROSSTHINK: Scaling Self-Learning beyond Math Reasoning
2025
Cited
0
Load more
Resume (English only)
Academic Achievements
Published numerous high-impact papers at venues including NeurIPS 2022, ACL 2021, EMNLP 2020, EACL 2023, etc.
Contributed to Megatron-Turing NLG 530B, the world’s largest and most powerful generative language model at the time
Proposed the Minitron approach for LLM pruning and distillation (ArXiv 2024)
Contributed to StarCoder 2 and The Stack v2 (ArXiv 2024)
Paper 'Scaling Language Model Training to a Trillion Parameters Using Megatron' received Best Student Paper Award at SC 2021
Megatron-LM paper has ~300 citations
Co-authors
13 total
Mohammad Shoeybi
Senior Director of Applied Research at NVIDIA
Bryan Catanzaro
NVIDIA
Co-author 3
Alok Choudhary
Professor Northwestern University
Pradeep Dubey
Intel Corporation
Co-author 6
Jared Casper
Research Scientist, NVIDIA
Co-author 8
×
Welcome back
Sign in to Agora
Welcome back! Please sign in to continue.
Email address
Password
Forgot password?
Continue
Do not have an account?
Sign up