Scholar

Shaurya Rohatgi

Google Scholar ID: UpHQFasAAAAJ

IFM, MBZUAI

Machine LearningNLPInformation Retrieval

Citations & Impact

All-time

Citations

676

H-index

i10-index

Publications

Co-authors

list available

Contact

Publications

2 items

2025

Cited

arXiv.org · 2023

Cited

Resume (English only)

Academic Achievements

Publications: 'Do Large Multimodal Models Solve Caption Generation for Scientific Figures? Lessons Learned from SCICAP Challenge 2023' and others; Projects such as Llama-index.

Research Experience

May 2025 - Present, Research Scientist at MBZUAI, specializing in foundational LLM research and developing high-quality training datasets; 2023 - May 2025, Applied Scientist, leading the development of advanced LLM systems and RAG pipelines; 2023, Computational Scientist, designed smaRT, an AI system for automated ticket classification and resolution; 2021 - 2022, Research Intern, developed S2AMP dataset and a large-scale paper clustering system; 2018 - 2023, Research Assistant, led the MathSeer project and developed neural ranking models; 2014 - 2017, Researcher, built dialogue-based natural language understanding systems and email categorization algorithms.

Education

PhD in Information Sciences and Technology from Pennsylvania State University, August 2017 - May 2023; Thesis: Design and Data Mining Techniques for Large-Scale Scholarly Digital Libraries and Search Engines; Integrated Post Graduate in Information Technology from Indian Institute of Information Technology and Management, July 2009 - June 2014.

Background

Specializes in building state-of-the-art retrieval systems and training large language models. Works at the intersection of language models and scientific discovery, developing systems that make AI more useful, reliable, and accessible. Expertise includes fine-tuning, optimizing, and deploying large language models, building advanced search and retrieval systems, designing AI systems that can autonomously plan, reason, and execute complex tasks, and building AI systems that accelerate research.

Co-authors

20 total