Data Knowledge System Research Scientist - (Data Platform-Global Live) - Global Frontier Tech Recruitment Program - 2027 Start (PhD)

TikTok
San Jose, California

About the job

The Data Platform Global Live team is dedicated to empowering the growth of TikTok LIVE business through big data. We support our businesses in achieving their missions by building high quality real-time and offline data warehouses, creating various forms of efficient and data-friendly data assets, and exploring and implementing business oriented data solutions. We provide stable and reliable data capabilities for daily operations, analyses, decision-making of TikTok LIVE features, in addition to robust data support to enhance live performance for streamers. We are building a next-generation enterprise knowledge system for the LLM era. Our goal is to enable large language models to understand, access, and operate on enterprise data, including data warehouses, documents, logs, and real-time streams. This role focuses on designing and researching a unified knowledge layer that supports query, reasoning, and execution, integrating RAG, knowledge graphs, and agent-based systems. You will work at the intersection of data infrastructure, AI systems, and knowledge modeling, and help define how AI interacts with enterprise data.

Responsibilities

- Research and design unified knowledge representations for enterprise data

- Explore and build RAG-based knowledge systems with high accuracy and low latency

- Develop ontology / semantic layers to bridge data and LLM understanding

- Design knowledge ingestion and update mechanisms (batch + real-time)

- Improve LLM grounding, traceability, and reliability

- Explore agent-based reasoning and execution frameworks

- Prototype and validate new ideas, and bring them into production systems

Qualifications

Minimum

- Individuals who are completing or have recently completed a PhD in Software Development, Computer Science, Computer Engineering, or a related technical discipline.

- Strong programming skills in Python / Java / Scala

- Solid understanding of data systems, data modeling, or distributed systems

- Experience in at least one of the following:

- Data engineering/backend systems

- Machine learning/LLM systems

- Strong problem-solving skills and curiosity about new technologies

Preferred

- Experience with LLM, RAG, or vector databases

- Knowledge of knowledge graphs or ontology modeling

- Experience with real-time data processing (Flink, Kafka, etc.)

- Understanding of AI agents or workflow orchestration

- Experience building data platforms or knowledge systems