Research Intern - Multimodal Language Models

Microsoft
San Francisco Bay area / New York City metropolitan area2025-11-21onsite

About the job

We are seeking a Research Intern to explore innovative approaches for building efficient multimodal language models. The role will focus on techniques such as model compression, quantization, and model optimization for efficient deployment on resource-constrained platforms. You will work on training strategies to enhance performance and scalability across vision-language tasks. Responsibilities include prototype implementations, designing experiments, analyzing results, and contributing to research that pushes the boundaries of efficiency in AI systems. Ideal candidates should have a foundation in machine learning, experience with deep learning frameworks (e.g., PyTorch), and an interest in scalable model design and optimization. Familiarity with multimodal architectures and low-bit quantization is a plus.

Responsibilities

Research Interns put inquiry and theory into practice. Alongside fellow doctoral candidates and some of the world’s best researchers, Research Interns learn, collaborate, and network for life. Research Interns not only advance their own careers, but they also contribute to exciting research and development strides. During the 12-week internship, Research Interns are paired with mentors and expected to collaborate with other Research Interns and researchers, present findings, and contribute to the vibrant life of the community. Research internships are available in all areas of research, and are offered year-round, though they typically begin in the summer.

Qualifications

Minimum

Accepted or currently enrolled in a PhD program in Computer Science or related STEM field. Other Requirements: Research Interns are expected to be physically located in their manager’s Microsoft worksite location for the duration of their internship. In addition to the qualifications below, you’ll need to submit a minimum of two reference letters for this position as well as a cover letter and any relevant work or research samples. After you submit your application, a request for letters may be sent to your list of references on your behalf. Note that reference letters cannot be requested until after you have submitted your application, and furthermore, that they might not be automatically requested for all candidates. You may wish to alert your letter writers in advance, so they will be ready to submit your letter.

Preferred

Foundation in machine learning and deep learning, with expertise in areas such as multimodal language models, transformer architecture, efficient model design, compression, and quantization. Proficiency in modern deep learning frameworks (e.g., PyTorch, DeepSpeed) for scalable model development and optimization. Proven ability to define and execute original research agendas, demonstrating creativity and technical rigor. Motivation to publish in top-tier academic venues, showcasing impactful contributions to the research community.