About the job
As a Principal Software Engineer, you will work on the infrastructure and tools to support large scale model fine-tuning, evaluation, and inference.
Responsibilities
Lead the collaboration with engineers and researchers to build and optimize training infrastructure and tools for LLMs, SLMs, multimodal, and code-specific models.
Design, build and improve services with high scalability and reliability.
Design and implement the services to serve the prod traffic and fulfill the security and privacy requirements.
Lead the efforts to deliver and improve engineering systems and practices to ensure service quality in complex cloud environments.
Contribute to the deployment and monitoring of services in production environments.
Qualifications
Minimum
Bachelor's Degree in Computer Science or related technical field and 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, Python or equivalent experience.
Preferred
6+ years designing, developing, and shipping high quality software.
3+ years of experience with distributed systems and cloud-based infrastructure.
2+ years of experience with containerization tools (e.g., Docker, Kubernetes).
2+ years of experience with DevOps practices (CI/CD, automated testing, deployment, etc.).
Passionate and self-motivated. Strong ability in self-learning, entering new domain, managing through uncertainty in an innovative team environment.
Familiarity with virtualization technology.
Familiarity with production ML systems and concepts like model serving, caching, batching, and monitoring.