← all roles

Inference & Model Serving Jobs

Running models in production — inference engines, model serving, and latency/throughput optimization (vLLM, TensorRT and similar). 60 open now, refreshed daily.

open roles
60
companies
20
list salary
27 · $139K–$560K
visa mention
18
remote
4

Observed across current open postings, refreshed daily — not a survey. Salary band is drawn only from roles that publish a range. Salary breakdown →

Inference and model-serving roles own the production side: getting trained models to answer fast and cheaply under real traffic. That means serving engines and runtimes (vLLM, TensorRT-LLM and the like), continuous batching and KV-cache strategy, quantization, and the latency/throughput trade-offs that decide unit economics for anyone shipping an LLM product. They concentrate at the labs and inference-platform startups whose revenue is literally tokens-per-second — so the work rewards people who reason fluently about both model internals and the systems that run them.

Hiring most for this specialty: Anthropic 11 · Cerebras Systems 9 · Databricks 7 · Inworld AI 5 · CoreWeave 4 · Together AI 4 · see all who's hiring →

filter
view
60 roles · refreshed 2026-06-01 11:35 UTC