Member of Technical Staff - Post Training, Applied (Vision)

About the job

This is a rare chance to sit at the intersection of frontier vision-language models and real-world deployment. You'll own applied post-training work for VLMs end-to-end for some of the world's largest enterprises, while still contributing directly to Liquid's core multimodal model development. Unlike most roles that force a trade-off between customer impact and foundational work, this role gives you both: deep ownership over how vision-language models are adapted, evaluated, and shipped, and a direct line into the evolution of Liquid's multimodal post-training stack.

Responsibilities

Act as the technical owner for enterprise customer VLM post-training engagements.

Translate customer requirements into concrete multimodal post-training specifications and workflows.

Design and execute visual data generation, filtering, and quality assessment processes, including image-text pair curation, annotation pipelines, and synthetic data generation for visual tasks.

Run supervised fine-tuning, preference alignment, and reinforcement learning workflows for vision-language models.

Design task-specific evaluations for visual understanding, grounding, OCR, document parsing, and other multimodal capabilities. Interpret results and feed learnings back into core post-training pipelines.

Qualifications

Minimum

Hands-on experience with data generation and evaluation for VLM or multimodal post-training.

Experience training or fine-tuning vision-language models using SFT, preference alignment, and/or RL.

Strong intuition for visual data quality, annotation design, and multimodal evaluation.

Familiarity with vision encoders, image-text architectures, and how visual representations interact with language model backbones.

Preferred

Experience with visual grounding, document understanding, OCR, or video understanding tasks.

Experience contributing to shared or general-purpose multimodal post-training infrastructure.

Prior exposure to customer-facing or applied ML delivery environments.

Familiarity with alignment or RL techniques beyond basic supervised fine-tuning in the multimodal setting.