About the job
You will work as a Senior Research Software Development Engineer focused on advancing language model engine-level capabilities through applied research to integration. This role is responsible for integrating in-house techniques and state-of-the-art research into a variety of first-party (1P) Microsoft engines and third-party (3P) industry engines. You will translate research ideas into high-performance, production-ready implementations, contributing directly to new engine capabilities that improve model correctness, efficiency, robustness, and expressive control.
Responsibilities
Advance language model engine capabilities through applied research and production engineering, integrating in-house innovations and state-of-the-art techniques to improve model accuracy, speed, reliability, and expressivity across first-party and third-party engines.
Design, implement, and review performance-critical engine code (primarily in Python and Rust), ensuring high standards for correctness, test coverage, security, diagnosability, and maintainability, while coaching peers through rigorous and timely code reviews.
Apply AI-native development practices across the full SDLC, using AI tools responsibly for design, coding, testing, and analysis, and taking ownership of the quality and correctness of AI-assisted outputs while helping establish best practices across the team.
Develop and evolve advanced inference techniques (e.g., speculative decoding, constrained decoding, structured generation), validating design choices through experimentation, benchmarking, and production telemetry.
Own engine-level design and integration decisions, producing clear design documents, evaluating trade-offs across multiple architectural options, and collaborating across teams to ensure solutions meet requirements for performance, scalability, reliability, security, and cost.
Drive engineering excellence in production environments, including comprehensive testing strategies, observability, live-site readiness, incident response, and post-incident learning, with a focus on reducing operational risk in multi-tenant inference systems.
Contribute to and leverage open-source LM infrastructure where appropriate, responsibly reusing and extending external code, sharing learnings with the broader community, and continuously staying current with emerging research, tools, and engine-level techniques.
Qualifications
Minimum
Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to Rust or C++, and Python OR equivalent experience. Other Requirements: Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years
Preferred
Master's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, Rust or C++, and Python OR Bachelor's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with