Principal, Data Engineer

Walmart Global Tech
IN TN CHENNAI Home Office Capita Land2026-05-13Full time

About the job

As a Principal Data Engineer, you are a hands-on, senior technical engineer within Walmart Global Tech, AI and Data team, responsible for developing and contributing towards high impact Global Marketplace’s enterprise data platform deliverables at a global scale. Your work directly enables sellers at Walmart, serves hundreds of millions of customers and associates, across digital and physical channels, every day. You operate at the intersection of retail scale, near real-time intelligence, and Agentic AI, translating our long-term platform strategy into durable business impact.

Responsibilities

Transform and evolve Walmart’s core data platform using Agentic AI across batch, streaming, and hybrid systems to support data powered Global Marketplace applications, analytics, operational intelligence, and AI-native workloads.

Develop data platforms that scale to billions of events per day, supporting both near real-time retail decisions and deep analytical insights.

Develop agent-ready data, ensuring data products are discoverable, semantically rich, and optimized for LLMs, copilots, and multi-agent workflows.

Influence multiple teams and organizations through technical leadership and clear architectural direction—without direct authority.

Develop fault-tolerant, resilient systems that support mission-critical retail and marketplace operations.

Lead modernization of legacy data lakes and pipelines into composable, event-driven, and agent-aware platforms.

Enable incremental modernization that delivers immediate value while reducing long-term complexity and operational risk.

Mentor Staff and Senior Engineers across Walmart Global Tech AI and Data team.

Develop APIs and services for enabling data for dependent systems and applications

Develop and implement best-in-class Data pipelines to ensure on time availability of Data and Insights.

Data Source Identification: Helps identify the most suitable source for data that is fit for purpose.

Data Modeling: Analyses complex data elements, systems, data flows, dependencies, and relationships to contribute to conceptual, physical, and logical data models.

Enable data scientists, business and product partners to fully leverage our platform.

Demonstrate and transform business requirements to code, specific analytical reports and tools.

Design, build, test and deploy cutting edge solutions at scale, impacting multi-billion-dollar business.

Adopt and build AI Agents for discovery, development of data assets.

Work closely with product owner and technical lead and play a major role in the overall delivery of the assigned project/enhancements.

Learn & Research on the go and work on both new requests/projects as well as support production.

Provide business insights while leveraging internal tools and systems, databases and industry data.

Own a data subject and ensure availability and accuracy

Work closely with all business units and engineering teams to develop strategy for long term data platform architecture.

Design and implement data tools for analytics and data scientist team members to help them in building and optimizing in the Walmart ecosystem.

Creates training documentation and trains end-users on data modeling. Oversees the tasks of less experienced programmers and stipulates system troubleshooting supports.

Document requirements, data lineage, subject matter in both business and technical terminology.

Applied Business Acumen: Provides recommendations to business stakeholders to solve complex business issues. Develops business cases, translates business requirements into projects, activities, and tasks and aligns to overall business strategy.

Participate in designing data application architecture with senior architects.

Data Governance: Establishes, modifies, and documents data governance practices in partnership with business stakeholders and peers.

Promotes and supports company policies, procedures, mission, values, and standards of ethics and integrity by training and providing direction to others in their use and application, ensuring compliance with them.

Recommend ways to improve data reliability, efficiency, and quality.

Qualifications

Minimum

Bachelor’s degree in computer science or related discipline with 12+ years’ experience

Minimum 10 years of experience in Big Data and distributed computing.

Minimum 10 years of experience programming in Java, scala, python, Springboot & Nodejs,

Strong experience with cloud-native ecosystems (GCP ), including BigQuery, Serverless, Pub/Sub, or equivalent.

Expertise in batch and streaming technologies (Kafka, Spark Structured Streaming, Flink, Druid, etc.).

Experience working with hybrid architectures that support both real-time operations and analytical workloads.

Agentic & AI-Ready Systems Knowledge

Strong understanding of semantic modeling, embeddings, knowledge graphs, and vector indexing.

Experience supporting RAG, context-aware AI, and agent orchestration through data platform design.

Ability to reason about schema design, latency, storage formats, and their impact on AI behavior and outcomes.

Strong Engineering Foundation

Fluency in Python, Java, or Scala.

Deep experience with Spark/PySpark and large-scale SQL optimization.

Strong systems thinking, performance tuning, and operational excellence mindset.

Influence & Communication

Demonstrated ability to lead through influence in complex, matrixed organizations.

Executive-level communication skills, with the ability to connect technical strategy to business and customer impact.

Proven experience building pipelines on Big Data Technologies/Stack – Hadoop, Spark, Hive, Presto, Kafka, Airflow Scheduler and GCP suite of data tools.

Deep understanding of the Hadoop ecosystem and strong conceptual knowledge in Hadoop architecture components and experience in working on at least one Big Data technology with Java, Python or Scala.

Strong knowledge of deploying and managing applications in GCP.

Strong scripting skills to process large amount of data and highly proficient in SQL.

Preferred

Solid knowledge of Linux systems with the ability to troubleshoot issues in complex, distributed, multi-tier architectures.

Experience in secure, scalable and highly available services.

Experience with data science and machine learning is a plus.

Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL and GCP ‘big data’ technologies.

Excellent hands-on working knowledge and experience with object-oriented/object function scripting languages: Python, Java, C++, Scala, and development of Micro-services.

Good written and verbal communication skills.