About the job
We are seeking a highly motivated and innovative Data Engineer to join our dynamic team. A key aspect of your work will involve data development, including collecting, curating, and analyzing customer interaction data to build AI models that understand individual preferences and behaviors. You will collaborate closely with cross-functional teams to create next-generation solutions that push the boundaries of natural language understanding, intelligent dialogue management, and tailored conversational experiences driven by robust data insights.
Responsibilities
Design, build, and maintain robust, scalable data pipelines. Perform data research to identify data sources within the ecosystem and apply enrichments to formulate meaningful data points. Implement, optimize and maintain scheduled jobs, batch processors and real-time data ingestion pipelines. Implement event-driven architecture to react to events. Optimize and fine-tune database performance to ensure it can support big data with ideal response times. Design data schemas that can evolve over time and align with strategic goals. Design, Implement and optimize microservices to expose the data to consuming applications. Design caching and data management practices to improve the performance. Ensure the data architecture supports the business requirements. Explore new opportunities for data acquisition and enhance data collection procedures. Explore and identify appropriate segmentation strategies to support RAG implementations.Demonstrate a commitment to learning and adopting emerging technologies, with a particular focus on agentic AI development.
Qualifications
Minimum
Over 10 years of experience as a Data Engineer or Software Engineer, with expertise in software engineering, data engineering, data warehousing, data research, and requirements gathering. Demonstrated expertise in programming languages such as Python and PySpark for executing data engineering tasks.Exceptional analytical and problem-solving skills, particularly in handling unstructured raw data and synthesizing meaningful patterns. Hands-on experience in developing complete ETL pipelines, from source to destination, including data cleansing, transformation and enrichment. Proficiency in PySpark for engineering data pipelines using Databricks on AWS or Azure. Technical prowess in data modeling, data mining, data architectures, and data warehousing. Proficiency in event driven architectures in cloud preferably in Azure (AWS/GCP is are also good). Proficiency in real-time data processing using Kafka. Expertise in a range of database and data warehouse technologies such as SQL (MySQL, PostgreSQL), NoSQL databases (MongoDB, Azure Cosmos DB, Bigtable), and data warehouse/data lake technologies (Snowflake, BigQuery). Proficiency in OAUTH providers, Auditing and Logging tools in cloud for monitoring and troubleshooting. Expertise in microservices to expose data to consuming applications. Familiarity with Node.js, TypeScript, and GraphQL is desired; willingness to learn these languages is strongly preferred. Familiarity with Linux and Docker.Familiarity with Agentic AI application development using frameworks such as ADK with Python.Experience with Power BI and other visualization tools such as Kibana, Grafana, or Tableau. Knowledge of cloud services (AWS, Google Cloud, or Azure) and understanding of distributed data processing frameworks. Ability to manage and delegate work across delivery teams to meet priorities. Skilled in client engagements, deciphering client business needs, and providing data solution recommendations. Excellent communication skills, with experience in designing, developing, and delivering presentations.
Preferred
Preferred Bachelor of Science in Computer Science, Statistics, Math or Scientific Computing; Developer nanodegree, or certification with equivalent experience value add.