About the job
We are seeking a Generative AI Systems Engineer to design, evaluate, and optimize Vision-Language Model (VLM) systems for real-world applications.
Responsibilities
Evaluate pretrained VLMs on domain-specific datasets
Define and justify appropriate evaluation metrics
Analyze model behavior, including systematic failure modes
Implement parameter-efficient fine-tuning techniques (e.g., LoRA, QLoRA)
Optimize training under limited data and compute constraints
Make data-centric and model-centric improvements with clear justification
Design controlled experiments to compare baseline vs improved models
Quantify improvements across: accuracy; latency; cost
Provide clear, defensible explanations for observed outcomes
Architect scalable inference pipelines for multimodal models
Optimize for: low latency; high throughput; cost efficiency
Implement serving layers (API/service) with reproducible environments
Build pipelines to process and align: images; textual queries; structured metadata
Analyze dataset characteristics, including biases and distribution gaps
Qualifications
Minimum
B.E/B. Tech
5–7 years of industry experience in ML/AI systems
Strong proficiency in Python and ML frameworks (e.g., PyTorch)
Experience with VLMs, LLMs or any other multimodal models
Understanding of model evaluation and experimentation practices
Familiarity with ML system design (inference, scaling, optimization)
Preferred
Experience with Vision-Language Models (e.g., LLaVA, BLIP, Flamingo-style architectures)
Hands-on experience with parameter-efficient fine-tuning methods
Knowledge of model optimization techniques: quantization; batching; caching (e.g., embedding reuse)
Experience with Docker / containerized deployments
Exposure to large-scale or real-world datasets