Inference & Model Serving Jobs

Running models in production — inference engines, model serving, and latency/throughput optimization (vLLM, TensorRT and similar). 45 open now, refreshed daily.

open roles: 45
companies: 21
list salary: 24 · $139K–$560K
visa mention: 12
remote: 4

Observed across current open postings, refreshed daily — not a survey. Salary band is drawn only from roles that publish a range. Salary breakdown →

Inference and model-serving roles own the production side: getting trained models to answer fast and cheaply under real traffic. That means serving engines and runtimes (vLLM, TensorRT-LLM and the like), continuous batching and KV-cache strategy, quantization, and the latency/throughput trade-offs that decide unit economics for anyone shipping an LLM product. They concentrate at the labs and inference-platform startups whose revenue is literally tokens-per-second — so the work rewards people who reason fluently about both model internals and the systems that run them.

Hiring most for this specialty: Anthropic 9 · Databricks 5 · Together AI 5 · Baseten 4 · CoreWeave 4 · OpenAI 2 · see all who's hiring →

filter

view

45 roles · refreshed 2026-07-20 08:58 UTC

Member of Technical Staff (Software Engineer, Inference & Training Platform)PerplexitySan Francisco—3d

Research Engineer, Post-Training InferenceTogether AISan Francisco$200K–$290K13d

Member of Technical Staff - RL InferencexAIPalo Alto—13d

Member of Technical Staff - Inference ResearchModalNew York—14d

Principal LLM Inference Engineerd-MatrixSanta Clara—19d

Staff + Senior Software Engineer, Inference DeploymentAnthropicSan Francisco$320K–$485K20d

Staff+ Software Engineer, Inference RuntimeAnthropicRemote-Friendly (Trav…$405K–$485K1mo

Software Engineer- BIS (Baseten Inference Stack)BasetenSan Francisco—1mo

Staff + Senior Software Engineer, InferenceAnthropicSan Francisco$320K–$485K1mo

Staff + Sr. Software Engineer, Cloud Inference Launch EngineeringAnthropicSan Francisco$320K–$485K1mo

Distributed LLM Inference EngineerAnyscaleSan Francisco—1mo

Staff + Sr. Software Engineer, Cloud InferenceAnthropicSan Francisco$320K–$485K1mo

Performance Engineer, Inference SystemsAnthropicSan Francisco$350K–$850K~2mo

Solution Architect (AI/LLM Inference)BasetenSan Francisco—2mo

Software Engineer, Productivity - Inference RuntimeOpenAISan Francisco—2mo

Staff Software Engineer, InferenceCoreWeaveSunnyvale$188K–$275K2mo

Performance Engineer (Inference, Training & GPU)World LabsSan Francisco—2mo

Lead Member of Technical Staff, Inference InfrastructureCohereSan Francisco—2mo

Software Engineer, Inference - Performance OptimizationOpenAISan Francisco—2mo

Software Engineer - Voice AI (Inference Runtime)BasetenSan Francisco—2mo

Software Engineer – AI Inference EngineFriendliAISeoul—2mo

Applied AI Inference EngineerBasetenSan Francisco—2mo

Senior Software Engineer, InferenceAnthropicLondon+1—4mo

Staff Software Engineer, InferenceAnthropicDublin+1—4mo

Senior Software Engineer I, InferenceCoreWeaveSunnyvale$139K–$204K5mo

Research Engineer, Infrastructure, InferenceThinking MachinesSan Francisco$350K–$475K7mo

Software Engineer, Inference AI/MLCoreWeaveSunnyvale—8mo

Staff Software Engineer, Foundational Model ServingDatabricksSan Francisco$192K–$260K8mo

Solutions Architect (Inference)Together AILondon—9mo

Senior Software Engineer, Model ServingDatabricksSan Francisco$166K–$225K9mo

Staff Software Engineer, Model Serving DatabricksSan Francisco$192K–$260K9mo

Senior Software Engineer, InferenceCoreWeaveSunnyvale$152K–$204K9mo

Staff Software Engineer - GenAI inferenceDatabricksSan Francisco$190K–$232K9mo

Software Engineer - GenAI inference DatabricksSan Francisco$142K–$204K9mo

Member of Technical Staff - InferencePrime IntellectRemote—10mo

Senior Backend Engineer, Inference PlatformTogether AISan Francisco$160K–$250K11mo

Sr Engineer, Server InferenceTenstorrentBelgrade—12mo

LLM Inference Deployment EngineerEnCharge AIU.S.-Remote$180K–$240K12mo

Senior Site Reliability Engineer — Token Factory (Inference Platform)NebiusAmsterdam—13mo

Engineering Manager, InferenceAnthropicSan Francisco$425K–$560K13mo

LLM Inference Frameworks and Optimization EngineerTogether AISan Francisco$160K–$230K16mo

AI Infrastructure Engineer, Model Serving PlatformScale AISan Francisco$180K–$225K17mo

Software Engineer - Training/Inference (C++)xAIPalo Alto$180K–$440K~21mo

Member of Technical Staff - Model Serving / API Backend EngineerBlack Forest LabsFreiburg (Germany)$180K–$300K22mo

Machine Learning Engineer - InferenceTogether AISan Francisco$160K–$230K25mo