Machine Learning Project Intern (Business Integrity Data Cycling Center) - 2026 Start (BS/MS)

TikTok
San Jose, California

About the job

We are looking for passionate Machine Learning Engineer Interns that have strong problem solving skills to join forces with talented cross functional partners (business operation, data science, engineering and product management) to tackle cutting-edge challenges in the realm of generative AI. In this role, you will contribute to the company's core business across innovative advertising products, campaign management and measurement solutions. You will see a direct impact from your day-to-day work to customer satisfaction and company growth. As a project intern, you will have the opportunity to engage in impactful short-term projects that provide you with a glimpse of professional real-world experience. You will gain practical skills through on-the-job learning in a fast-paced work environment and develop a deeper understanding of your career interests.

Responsibilities

Support dataset construction and data quality initiatives, including sampling, preprocessing, label validation, and root cause analysis of human annotation inconsistencies.

Build and improve model evaluation frameworks for state-of-the-art generative AI models in production, and/or contribute to the iteration of next-gen generative AI models

Contribute to human-in-the-loop systems by analyzing human annotator behavior, and ML-assisted labeling strategies to improve efficiency and reliability.

Work closely with Product Managers, Data Scientists/Analysts, and cross-functional Software/Machine Learning Engineers to understand AIGC evaluation, safety, and data requirements, translating business problems into measurable ML tasks.

Communicate findings, experimental results, and data insights clearly to technical and non-technical stakeholders, supporting data-driven decision making.

Qualifications

Minimum

Undergraduate or postgraduate candidate currently pursuing a degree in Machine Learning, Computer Science, Software Engineering, or a closely related quantitative discipline.

Solid grounding in statistical theory & modeling, and transformer-based model fundamentals, acquired through coursework and practical projects.

Past internship/research experience in generative AI, LLM/VLM, or deep learning models.

Proficiency in Python, plus Shell scripting, with hands-on experience building ML prototypes, data processing workflows, or experimental pipelines.

Familiarity with ML / data science ecosystems such as NumPy, pandas, PyTorch / TensorFlow, scikit-learn, or similar frameworks.

Strong problem-solving and analytical skills, with the ability to reason about model behavior, data quality, and performance trade-offs.

Effective communication and collaboration skills, enabling productive work with cross-functional partners including ML engineers, data teams, and product stakeholders.

Preferred

Experience working with SQL for data querying, aggregation, and analysis in large-scale or production-like datasets.

Exposure to large-scale data processing or distributed computing environments (e.g., Spark, Hive, Airflow, or equivalent).

Understanding of human-in-the-loop systems, data annotation workflows, or data quality challenges in the field of generative AI.

Demonstrated ability to build small projects, prototypes, or research experiments involving pre-training, post-training and evaluation of generative AI models.