Software Engineer, At Scale Compute Analysis

Nvidia
US, CA, Santa Clara / US, CA, Remote2026-04-19remote_local

About the job

NVIDIA is seeking a candidate to perform data analysis for datacenter applications. This position offers the opportunity to contribute to leading advancements in artificial intelligence and GPU computing through engagement with cutting-edge hardware and software solutions. Provide insights on large scale system design using machine learning for GPU accelerated clusters. We are working with the latest Accelerated computing and Deep Learning software and hardware platforms, along with many scientific researchers, developers, and customers to craft improved workflows and develop new, leading differentiated solutions. Our team interacts with OS, container technologies, GPU compute, and systems specialists to architect, develop and bring up large scale performance software components and optimize performance.

Responsibilities

Work with a team passionate about large scale datacenter development and deployment. We are looking for someone to assist in analyzing large scale workloads and searching for application and infrastructure improvement opportunities.

Provide actionable insights on high dimensional to assist engineers in building creative solutions based on NVIDIA technology.

Research and analyze data, identify trends of interest, link changes to recorded events, craft conclusions, create visuals, and help the team make intelligent data based decisions.

Work closely with a team to learn more about analysis needs and communicate findings with team members.

Apply machine learning or deep learning techniques for classification and prediction. Implementation and integrate into existing software toolsets for team usability.

Document work and related materials to help team build on top of your contributions.

Qualifications

Minimum

4+ years of proven experience debugging and analyzing data and building visuals show trends of interest.

BS or MS in Engineering, Mathematics, Physics, or Computer Science or equivalent program.

Strong programming skills, especially in Python and Javascript.

Exposure to telemetry tools like Grafana, ElasticSearch, or Splunk.

Understanding of core machine learning techniques and concepts.

Fast and self-learning capabilities with strong analytical and problem-solving skills.

Desire to learn and be part of a committed and hardworking team.

Strong teamwork and interpersonal skills.

Preferred

Experience with deep learning frameworks like TensorFlow or PyTorch.

Action driven analytical skills with good attention to detail.

Exposure to high performance or large-scale computing environments.

Experience visualizing high dimensional data problems.

Exposure to Linux based operating systems.