Scholar

Mostofa Patwary

Google Scholar ID: 0rt4tbMAAAAJ

Director, Applied Deep Learning Research, NVIDIA

Natural Language ProcessingLarge Scale Deep LearningHigh Performance ComputingParallel

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

10,446

H-index

35

i10-index

57

Publications

20

Co-authors

13

list available

Contact

Publications

10 items

LatentMoE: Toward Optimal Accuracy per FLOP and Parameter in Mixture of Experts

2026

Cited

1

Nemotron Elastic: Towards Efficient Many-in-One Reasoning LLMs

2025

Cited

0

NVIDIA Nemotron Nano V2 VL

2025

Cited

0

Multi-Agent Evolve: LLM Self-Improve through Co-evolution

2025

Cited

0

RLP: Reinforcement as a Pretraining Objective

2025

Cited

0

Nemotron-CC-Math: A 133 Billion-Token-Scale High Quality Math Pretraining Dataset

2025

Cited

0

Fusing LLM Capabilities with Routing Data

2025

Cited

0

NEMOTRON-CROSSTHINK: Scaling Self-Learning beyond Math Reasoning

2025

Cited

0

Resume (English only)

Academic Achievements

Published numerous high-impact papers at venues including NeurIPS 2022, ACL 2021, EMNLP 2020, EACL 2023, etc.
Contributed to Megatron-Turing NLG 530B, the world’s largest and most powerful generative language model at the time
Proposed the Minitron approach for LLM pruning and distillation (ArXiv 2024)
Contributed to StarCoder 2 and The Stack v2 (ArXiv 2024)
Paper 'Scaling Language Model Training to a Trillion Parameters Using Megatron' received Best Student Paper Award at SC 2021
Megatron-LM paper has ~300 citations

Co-authors

13 total

Mohammad Shoeybi

Senior Director of Applied Research at NVIDIA

Bryan Catanzaro

Professor Northwestern University

Intel Corporation

Research Scientist, NVIDIA