About the job
NVIDIA is seeking a Senior Research Scientist passionate about multi modal language models. Our team drives Nemotron Multi-modal technology and with your help, we will continue to drive our models to be state of the art open-source multi-modal models. We have a unique perspective in that we strive for open models, open weights, open data. We want to deliver models that work amazingly well in the real world right out of the box, and we also want to uplift the whole ecosystem of users of multi-modal LLMs.
Responsibilities
Driving new abilities into the model
Improving generalization of existing functionalities by understanding weak points, designing a data synthesiis solution, and retraining models
Developing recipes for training models that mix multiple modalities together, such as text, image, video, audio, etc …
Design solutions that improve pareto efficiency
Collaborating with researchers to translate cutting-edge ideas into production-ready implementations.
Exploring new paradigms for evaluation.
Demonstrating strong engineering practices, and contributing to open-source communities.
Qualifications
Minimum
PhD in Computer science, Electrical Engineering, or related field, or equivalent research experience in LLMs, systems, or related areas.
4+ years of experiences in computer vision, especially multi-modal LLMs.
Proficiency in Python with hands-on experience in frameworks such as PyTorch.
Solid background in computer science fundamentals: algorithms, data structures, parallel/distributed computing, and systems programming.
Proven ability to collaborate across research and engineering teams in multifaceted environments.
Preferred
Specific multi-modal LLM research experience
Experience developing and scaling large distributed systems for deep learning.
Contributions to open-source LLM systems or large-scale AI infrastructure