Machine Learning Engineer Job at Evolve Group, Fremont, CA

TGpreVo3cTVxdnZaUFRybE1QaDVSY0tvbkE9PQ==
  • Evolve Group
  • Fremont, CA

Job Description

Machine Learning Engineer

Tech start-up

San Fransisco based

We’ve partnered with one of the most ambitious and technically rigorous AI research labs in the world. Based in San Francisco, this team is building foundation models entirely from scratch.

They are now hiring ML Infrastructure Engineers to design and scale the systems that power large-scale, distributed model training. If you’ve built infrastructure that runs across hundreds of GPUs, thrive under technical complexity, and want to work side-by-side with elite AI researchers — this is the role.

Key Responsibilities:

  • Build and scale distributed training systems for large-scale model training across LLMs, vision, and robotics.
  • Set up and run large-scale training across many GPUs using tools like Kubernetes, DeepSpeed, and FSDP.
  • Troubleshoot system issues (GPU errors, network problems) and build tools to monitor and recover from failures.
  • Optimize PyTorch pipelines, sharding, and sampling strategies.
  • Collaborate closely with researchers to support novel model training at scale.

Requirements:

  • 3–15 years in ML infrastructure, systems, or research engineering roles.
  • Proven experience scaling distributed training for large models.
  • Strong with PyTorch, CUDA, NCCL, Kubernetes.
  • Familiar with setting up distributed training clusters.
  • Deep understanding of PyTorch dataloaders, data sharding, and sampling.
  • Strong communicator with a collaborative, mission-driven mindset.

This is a fully in-person role based in San Francisco , it's ideal for engineers excited to build at the edge of what's possible in AI.

Job Tags

Immediate start,

Similar Jobs

Lansing Building Products

CDL A or B Local Straight Truck Driver in Braintree, MA Job at Lansing Building Products

 ...experience for our associates and our customers.This job requires a driver that is willing and able to lift boxes weighing as much as 80+...  ...:00 PM * Level of Touch: Load and unload Equipment * Late model trucks with back up cameras * Must be able to drive a manual... 

City and County of San Francisco

Medical Social Worker - SFDPH - (EXEMPT) Job at City and County of San Francisco

 ...current vacancies in this class at other locations within the DPH. Job Description Under general supervision, the Medical Social Worker performs routine medical social work duties. EXAMPLES OF DUTIES: Evaluates social, emotional and physical needs of clients... 

Five Below

Store Asset Protection Associate Full Time Job at Five Below

 ...Five Below - JobID: JR47275 [Loss Prevention / Security] As an Asset Protection Associate at Five Below, you'll: Be responsible for protecting company assets from safety risks, loss and shrinkage in assigned store; Be responsible for effective and confidential investigations... 

Centra Healthcare Allied

Travel Behavioral Health Technician Job at Centra Healthcare Allied

 ...Job Description Centra Healthcare Allied is seeking a travel Behavioral Health Technician for a travel job in Bow, New Hampshire. Job Description & Requirements ~ Specialty: Behavioral Health Technician ~ Discipline: Allied Health Professional ~ Start Date... 

American Red Cross

Blood Collections Driver Job at American Red Cross

 ...Overview): When you join our team you will be utilizing your healthcare and/or customer service skills to assist with every step of the blood collection process. This includes collaborating with your team to transport and setup equipment at the collection sites in local...