Machine Learning Intern

About the job

We are looking for highly motivated interns to join our compute team as a machine learning scientist looking to work at the intersection of machine learning and life sciences for our Summer 2026 cohort. You will partner directly with a team mentor in developing and/or applying ML methods to a process and analyze large scale datasets from multiple modalities over the course of the summer (11-12 weeks). These internships can based in on South San Francisco headquarter with a hybrid work schedule or can be remote based on the team mentor's location and business need.

Responsibilities

- Leverage publicly available single cell transcriptomics resources to extract insights about disease mechanisms relevant to the therapeutic areas;

- Develop, productionize, and deploy cutting-edge ML approaches to integrate large-scale multi-modal phenotypic datasets;

- Develop workflows to enable post-GWAS (Genome-Wide Association Scan) analysis of results, e.g., fine-mapping;

- Translational genetics deep dives: enabling higher throughput annotation and exploration of candidate genes from our discovery efforts;

- Design of statistical methods to improve rare variant burden tests, and methods to improve power in longitudinal phenotypes;

- Develop ML models for imputing disease-relevant phenotypes from high-content clinical imaging datasets, e.g., MRI, PET-CT;

- Develop ML methods for disentangling and genetically interpreting axes of variation in complex phenotypes;

- Use LLMs to extract disease-relevant information from medical records;

- Explore generative models of small molecules, biologics, and/or oligonucleotide therapeutics in various data modalities such as 2D and 3D representations for hit-to-lead drug discovery efforts.;

- Develop new geometric deep learning methods to better characterize nuanced molecular properties and relationships.;

- Identify and prototype novel microscopy-driven phenotyping workflows, including hardware acquisition, post-processing, and featurization;

- Develop robust software tooling to support the deployment of new and existing methods for general use by insitro scientists;

- Optimize existing microscopy acquisition methods in both hardware and software, using ML feature outputs to benchmark improvements

Qualifications

Minimum

- Working towards a BS, MS, or Ph.D. in engineering, computational biology, systems biology, computer science, mathematics, statistics, life science, chemistry, physics, or a related field.;

- Proficiency in one or more general-purpose programming languages. We primarily use Python.;

- Interest in using and developing brand new statistical and machine learning methods inspired by real problems.;

- Curiosity about human physiology or disease biology.;

- Committed to writing high-quality, well-commented code and documentation.;

- Ability to communicate effectively and collaborate with people of diverse backgrounds and job functions.;

- Passion for making a difference in the world.

Preferred

- First-hand experience with biological data, preferably using computational approaches.;

- Passion for learning how to work with diverse functional genomic assays (RNA/DNase/ATAC/ChIP-seq, etc).;

- Interest in learning how to analyze single-cell RNA-seq data.;

- Solid understanding of computational chemistry, including virtual screening (classic QSAR modeling, structure based drug-discovery), library design, etc.;

- Demonstrated ability to use and develop cutting edge statistical and machine learning methods inspired by real problems.;

- Experience with machine and deep Learning frameworks (e.g., scikit-learn, PyTorch, etc.).;

- Demonstrated ability to write high-quality, production-ready code (readable, well-tested, with well-designed APIs).;

- Experience in Linux environment, database languages (e.g., SQL, No-SQL) and version control practices and tools such as Git.;

- Publications of high-quality work in relevant computational biology, bioinformatics, systems biology, life sciences, or biomedical venues, including journals and conferences.;

- Passionate about solving problems, asking questions and learning independently.;

- Familiarity with the SciPy/PyData ecosystem (numpy, pandas, scipy, dask etc.).;

- Familiarity with cloud computing services (AWS or GCP).;

- Familiarity with statistical analysis software, e.g., R.