← all roles

ML Systems & Infrastructure Jobs

Engineering the systems that train and serve large models — distributed training, ML platforms and infrastructure at frontier labs. 11 open now, refreshed daily.

open roles
11
companies
9
list salary
6 · $153K–$405K
visa mention
2
remote
1

Observed across current open postings, refreshed daily — not a survey. Salary band is drawn only from roles that publish a range. Salary breakdown →

ML-systems and infrastructure roles are about the machinery that trains and serves large models rather than the models themselves: distributed-training frameworks, data and checkpoint pipelines, scheduling, and the platform glue that lets research run reliably across thousands of accelerators. They concentrate at frontier labs and platform companies, where the bottleneck is rarely a single GPU and almost always coordination — fault tolerance across week-long runs, throughput at cluster scale, and the gap between 40% and 60% hardware utilization. Distributed-systems instinct usually outweighs deep kernel knowledge in these roles.

Hiring most for this specialty: Cerebras Systems 2 · Scale AI 2 · Anthropic 1 · Applied Intuition 1 · Lightmatter 1 · Nebius 1 · see all who's hiring →