About the job
At Pika, we are pioneering next-generation creative infrastructure built around real-time video generation and intelligent, agentic platforms. We are seeking an accomplished Real-time Video Researcher to drive forward our mission to make agentic real-time video technology accessible, dynamic, and transformative for millions of creators. As a core member of our research team, you will be integral to designing and building foundational technologies, developing novel approaches for real-time video synthesis, and orchestrating intelligent agentic systems that power scalable, interactive multimedia experiences. You will collaborate closely with engineering and product teams, shaping the future of real-time creative platforms.
Responsibilities
Lead and contribute to research efforts focused on real-time video generation, streaming, editing, and orchestration of agentic platform infrastructure
Design and prototype novel algorithms and architectures for real-time, high-fidelity video synthesis and interactive experiences
Focus heavily on real-time aspects of video generation and synthesis
Work on diffusion model distillation and develop diffusion-based world models for video applications
Train and finetune autoregressive models and diffusion models with a focus on real-time performance
Curate specific datasets, especially for camera motion and human motion data
Collaborate with cross-functional teams to bring research advancements into production-ready technologies
Publish work in top-tier conferences and journals, and communicate results internally and externally
Stay at the cutting-edge of the field, monitoring new developments in real-time video, generative AI, multimodal systems, and agentic orchestration
Qualifications
Minimum
5+ years of relevant experience, including research during graduate studies, in real-time video generation, deep learning, or related fields such as image/audio generation and deep experience in multimodals
Demonstrated impact as first author on major publications in top conferences or journals (e.g., CVPR, ICCV, NeurIPS, SIGGRAPH, etc.)
Deep expertise in computer vision, video synthesis, generative models, and machine learning
Hands-on experience with diffusion model distillation and developing diffusion-based world models
Experience training and finetuning autoregressive models, especially for real-time applications
Strong capability and experience in data curation, especially for camera motion and human motion datasets
Experience developing and deploying real-time video systems and/or agentic orchestration infrastructure
Strong programming and prototyping skills (Python, PyTorch, TensorFlow, etc.)
Passion for building creative tools and platforms that empower users
Excellent communication and collaboration skills
Preferred
No preferred qualifications listed.