Ładowanie...
Ładowanie...
This role is about building high-performance AI inference systems using NVIDIA GPUs. You will work on vLLM, an open-source inference framework, optimizing it for the newest models and hardware. Daily work includes writing custom GPU kernels (hand-tuned and compiler-generated), developing compiler infrastructure, and defining benchmarking methodologies like MLPerf. You'll also architect scheduling and orchestration for large-scale multi-GPU, multi-node deployments across clouds. The position combines deep systems engineering with research, pushing the frontier of accelerated computing for AI.
Oferta dla doświadczonych specjalistów (Senior).
Someone with strong fundamentals in systems programming (C++/C), basic GPU programming, and familiarity with machine learning inference pipelines. They should have at least 2-3 years of experience in performance optimization and be eager to learn GPU kernel development.
A junior engineer without systems programming experience or someone who prefers high-level application development without deep hardware interaction. Also not for those who dislike research or open-source contributions.
A highly collaborative, research-oriented team of experts in AI systems and performance optimization, with world-renowned leadership.
Brakuje części pól — sprawdź szczegóły u źródła.