Publications: 'Do Large Multimodal Models Solve Caption Generation for Scientific Figures? Lessons Learned from SCICAP Challenge 2023' and others; Projects such as Llama-index.
Research Experience
May 2025 - Present, Research Scientist at MBZUAI, specializing in foundational LLM research and developing high-quality training datasets; 2023 - May 2025, Applied Scientist, leading the development of advanced LLM systems and RAG pipelines; 2023, Computational Scientist, designed smaRT, an AI system for automated ticket classification and resolution; 2021 - 2022, Research Intern, developed S2AMP dataset and a large-scale paper clustering system; 2018 - 2023, Research Assistant, led the MathSeer project and developed neural ranking models; 2014 - 2017, Researcher, built dialogue-based natural language understanding systems and email categorization algorithms.
Education
PhD in Information Sciences and Technology from Pennsylvania State University, August 2017 - May 2023; Thesis: Design and Data Mining Techniques for Large-Scale Scholarly Digital Libraries and Search Engines; Integrated Post Graduate in Information Technology from Indian Institute of Information Technology and Management, July 2009 - June 2014.
Background
Specializes in building state-of-the-art retrieval systems and training large language models. Works at the intersection of language models and scientific discovery, developing systems that make AI more useful, reliable, and accessible. Expertise includes fine-tuning, optimizing, and deploying large language models, building advanced search and retrieval systems, designing AI systems that can autonomously plan, reason, and execute complex tasks, and building AI systems that accelerate research.