Generative AI Systems Engineer – Vision-Language Models

About the job

We are seeking a Generative AI Systems Engineer to design, evaluate, and optimize Vision-Language Model (VLM) systems for real-world applications.

Evaluate pretrained VLMs on domain-specific datasets

Define and justify appropriate evaluation metrics

Analyze model behavior, including systematic failure modes

Implement parameter-efficient fine-tuning techniques (e.g., LoRA, QLoRA)

Optimize training under limited data and compute constraints

Make data-centric and model-centric improvements with clear justification

Design controlled experiments to compare baseline vs improved models

Quantify improvements across: accuracy; latency; cost

Provide clear, defensible explanations for observed outcomes

Architect scalable inference pipelines for multimodal models

Optimize for: low latency; high throughput; cost efficiency

Implement serving layers (API/service) with reproducible environments

Build pipelines to process and align: images; textual queries; structured metadata

Analyze dataset characteristics, including biases and distribution gaps

B.E/B. Tech

5–7 years of industry experience in ML/AI systems

Strong proficiency in Python and ML frameworks (e.g., PyTorch)

Experience with VLMs, LLMs or any other multimodal models

Understanding of model evaluation and experimentation practices

Familiarity with ML system design (inference, scaling, optimization)

Experience with Vision-Language Models (e.g., LLaVA, BLIP, Flamingo-style architectures)

Hands-on experience with parameter-efficient fine-tuning methods

Knowledge of model optimization techniques: quantization; batching; caching (e.g., embedding reuse)

Experience with Docker / containerized deployments

Exposure to large-scale or real-world datasets