Senior Software Engineer, AI Inference Systems

NVIDIA

🛠 Wymagane technologie

Dane źródłowe

Python C++CUDA Kubernetes

AI Insights

Czym naprawdę jest ta rola?ML Systems Engineer

This role is about building high-performance AI inference systems using NVIDIA GPUs. You will work on vLLM, an open-source inference framework, optimizing it for the newest models and hardware. Daily work includes writing custom GPU kernels (hand-tuned and compiler-generated), developing compiler infrastructure, and defining benchmarking methodologies like MLPerf. You'll also architect scheduling and orchestration for large-scale multi-GPU, multi-node deployments across clouds. The position combines deep systems engineering with research, pushing the frontier of accelerated computing for AI.

Plusy

✓Work with world-renowned experts in AI systems
✓Opportunity to contribute to open-source vLLM and MLPerf
✓Conduct and publish original research
✓Cutting-edge technology: AI inference, GPU kernels, compilers
✓Strong emphasis on performance and efficiency

Codzienna praca

•Contribute features to vLLM to support new models and latest NVIDIA GPU hardware features
•Profile and optimize vLLM using speculative decoding, tensor/expert/pipeline parallelism, prefill-decode disaggregation
•Develop and optimize GPU kernels using fusion, autotuning, memory/layout optimization
•Build and extend high-level DSLs and compiler infrastructure for kernel development
•Define and build inference benchmarking methodologies and tools
•Contribute to MLPerf Inference benchmarks and submissions
•Architect scheduling and orchestration of containerized large-scale inference on GPU clusters across clouds
•Conduct and publish original research in ML systems

Więcej o ofercie↓

Dla kogo jest ta oferta

Profil idealny

Oferta dla doświadczonych specjalistów (Senior).

Minimum sensowne

Someone with strong fundamentals in systems programming (C++/C), basic GPU programming, and familiarity with machine learning inference pipelines. They should have at least 2-3 years of experience in performance optimization and be eager to learn GPU kernel development.

Raczej nie dla

A junior engineer without systems programming experience or someone who prefers high-level application development without deep hardware interaction. Also not for those who dislike research or open-source contributions.

Ocena dopasowania

Junior1/5

Mid2/5

Senior5/5

Hands-on5/5

Architekt4/5

Remotebrak danych

Enterprise4/5

Pytania do rekrutera

?How many engineers are on the inference team?
?What is the current team size and composition?
?Is the role remote or on-site? What is the expected location?
?What is the typical on-call or after-hours expectation for production systems?
?What is the balance between kernel development vs. framework integration?
?How much mentorship or guidance is available for junior team members?
?What is the process for publishing research – is it encouraged and supported?
?Are there opportunities to attend conferences or contribute to external benchmarks?

Brakujące informacje

−Salary range
−Location and remote/hybrid policy
−Team size
−Recruitment process details
−Work-life balance expectations

Zespół

A highly collaborative, research-oriented team of experts in AI systems and performance optimization, with world-renowned leadership.

🔗Podobne oferty

Oferta w skrócie

24 375 – 24 375PLN / mies.

❓—Tryb pracy

📄OtherKontrakt

⏱️Senior · 5+ latDoświadczenie

LokalizacjaZdalnie

Źródło

bulldogjob

Aktywna

Opublikowano2 maja 2026

Ostatnio sprawdzono7 maja 2026

Wygasa za40 dni

Otwórz ofertę u źródła

📊Jakość danych

Kompletność56%

Dane częściowe

Brakuje części pól — sprawdź szczegóły u źródła.